summaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2017-10-22Merge 4.4.94 into android-4.4Greg Kroah-Hartman
Changes in 4.4.94 percpu: make this_cpu_generic_read() atomic w.r.t. interrupts drm/dp/mst: save vcpi with payloads MIPS: Fix minimum alignment requirement of IRQ stack sctp: potential read out of bounds in sctp_ulpevent_type_enabled() bpf/verifier: reject BPF_ALU64|BPF_END udpv6: Fix the checksum computation when HW checksum does not apply ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header net: emac: Fix napi poll list corruption packet: hold bind lock when rebinding to fanout hook bpf: one perf event close won't free bpf program attached by another perf event isdn/i4l: fetch the ppp_write buffer in one shot vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit l2tp: Avoid schedule while atomic in exit_net l2tp: fix race condition in l2tp_tunnel_delete tun: bail out from tun_get_user() if the skb is empty packet: in packet_do_bind, test fanout with bind_lock held packet: only test po->has_vnet_hdr once in packet_snd net: Set sk_prot_creator when cloning sockets to the right proto tipc: use only positive error codes in messages Revert "bsg-lib: don't free job in bsg_prepare_job" locking/lockdep: Add nest_lock integrity test watchdog: kempld: fix gcc-4.3 build irqchip/crossbar: Fix incorrect type of local variables mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length mac80211: fix power saving clients handling in iwlwifi net/mlx4_en: fix overflow in mlx4_en_init_timestamp() netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value. iio: adc: xilinx: Fix error handling Btrfs: send, fix failure to rename top level inode due to name collision f2fs: do not wait for writeback in write_begin md/linear: shutup lockdep warnning sparc64: Migrate hvcons irq to panicked cpu net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs crypto: xts - Add ECB dependency ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock slub: do not merge cache if slub_debug contains a never-merge flag scsi: scsi_dh_emc: return success in clariion_std_inquiry() net: mvpp2: release reference to txq_cpu[] entry after unmapping i2c: at91: ensure state is restored after suspending ceph: clean up unsafe d_parent accesses in build_dentry_path uapi: fix linux/rds.h userspace compilation errors uapi: fix linux/mroute6.h userspace compilation errors target/iscsi: Fix unsolicited data seq_end_offset calculation nfsd/callback: Cleanup callback cred on shutdown cpufreq: CPPC: add ACPI_PROCESSOR dependency Revert "tty: goldfish: Fix a parameter of a call to free_irq" Linux 4.4.94 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-21Revert "bsg-lib: don't free job in bsg_prepare_job"Greg Kroah-Hartman
This reverts commit 668cee82cd28d2c38a99f7cbddf3b3fd58f257b9 which was commit f507b54dccfd8000c517d740bc45f20c74532d18 upstream. Ben reports: That function doesn't exist here (it was introduced in 4.13). Instead, this backport has modified bsg_create_job(), creating a leak. Please revert this on the 3.18, 4.4 and 4.9 stable branches. So I'm dropping it from here. Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
2017-10-19Merge 4.4.93 into android-4.4Dmitry Shmidt
Changes in 4.4.93 brcmfmac: add length check in brcmf_cfg80211_escan_handler() ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets CIFS: Reconnect expired SMB sessions nl80211: Define policy for packet pattern attributes iwlwifi: mvm: use IWL_HCMD_NOCOPY for MCAST_FILTER_CMD rcu: Allow for page faults in NMI handlers USB: dummy-hcd: Fix deadlock caused by disconnect detection MIPS: math-emu: Remove pr_err() calls from fpu_emu() dmaengine: edma: Align the memcpy acnt array size with the transfer HID: usbhid: fix out-of-bounds bug crypto: shash - Fix zero-length shash ahash digest crash KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit usb: renesas_usbhs: Fix DMAC sequence for receiving zero-length packet iommu/amd: Finish TLB flush in amd_iommu_unmap() ALSA: usb-audio: Kill stray URB at exiting ALSA: seq: Fix use-after-free at creating a port ALSA: seq: Fix copy_from_user() call inside lock ALSA: caiaq: Fix stray URB at probe error path ALSA: line6: Fix leftover URB at error-path during probe usb: gadget: composite: Fix use-after-free in usb_composite_overwrite_options direct-io: Prevent NULL pointer access in submit_page_section fix unbalanced page refcounting in bio_map_user_iov USB: serial: ftdi_sio: add id for Cypress WICED dev board USB: serial: cp210x: add support for ELV TFD500 USB: serial: option: add support for TP-Link LTE module USB: serial: qcserial: add Dell DW5818, DW5819 USB: serial: console: fix use-after-free after failed setup x86/alternatives: Fix alt_max_short macro to really be a max() Linux 4.4.93 Change-Id: I731bf1eef5aca9728dddd23bfbe407f0c6ff2d84 Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2017-10-18fix unbalanced page refcounting in bio_map_user_iovVitaly Mayatskikh
commit 95d78c28b5a85bacbc29b8dba7c04babb9b0d467 upstream. bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if IO vector has small consecutive buffers belonging to the same page. bio_add_pc_page merges them into one, but the page reference is never dropped. Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08Merge 4.4.91 into android-4.4Greg Kroah-Hartman
Changes in 4.4.91 drm_fourcc: Fix DRM_FORMAT_MOD_LINEAR #define drm: bridge: add DT bindings for TI ths8135 GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next RDS: RDMA: Fix the composite message user notification ARM: dts: r8a7790: Use R-Car Gen 2 fallback binding for msiof nodes MIPS: Ensure bss section ends on a long-aligned address MIPS: ralink: Fix incorrect assignment on ralink_soc igb: re-assign hw address pointer on reset after PCI error extcon: axp288: Use vbus-valid instead of -present to determine cable presence sh_eth: use correct name for ECMR_MPDE bit hwmon: (gl520sm) Fix overflows and crash seen when writing into limit attributes iio: adc: axp288: Drop bogus AXP288_ADC_TS_PIN_CTRL register modifications iio: adc: hx711: Add DT binding for avia,hx711 ARM: 8635/1: nommu: allow enabling REMAP_VECTORS_TO_RAM tty: goldfish: Fix a parameter of a call to free_irq IB/ipoib: Fix deadlock over vlan_mutex IB/ipoib: rtnl_unlock can not come after free_netdev IB/ipoib: Replace list_del of the neigh->list with list_del_init drm/amdkfd: fix improper return value on error USB: serial: mos7720: fix control-message error handling USB: serial: mos7840: fix control-message error handling partitions/efi: Fix integer overflow in GPT size calculation ASoC: dapm: handle probe deferrals audit: log 32-bit socketcalls usb: chipidea: vbus event may exist before starting gadget ASoC: dapm: fix some pointer error handling MIPS: Lantiq: Fix another request_mem_region() return code check net: core: Prevent from dereferencing null pointer when releasing SKB net/packet: check length in getsockopt() called with PACKET_HDRLEN team: fix memory leaks usb: plusb: Add support for PL-27A1 mmc: sdio: fix alignment issue in struct sdio_func bridge: netlink: register netdevice before executing changelink netfilter: invoke synchronize_rcu after set the _hook_ to NULL MIPS: IRQ Stack: Unwind IRQ stack onto task stack exynos-gsc: Do not swap cb/cr for semi planar formats netfilter: nfnl_cthelper: fix incorrect helper->expect_class_max parisc: perf: Fix potential NULL pointer dereference iommu/io-pgtable-arm: Check for leaf entry before dereferencing it rds: ib: add error handle md/raid10: submit bio directly to replacement disk i2c: meson: fix wrong variable usage in meson_i2c_put_data xfs: remove kmem_zalloc_greedy libata: transport: Remove circular dependency at free time drivers: firmware: psci: drop duplicate const from psci_of_match IB/qib: fix false-postive maybe-uninitialized warning ARM: remove duplicate 'const' annotations' ALSA: au88x0: avoid theoretical uninitialized access ttpci: address stringop overflow warning Linux 4.4.91 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-08partitions/efi: Fix integer overflow in GPT size calculationAlden Tondettar
[ Upstream commit c5082b70adfe8e1ea1cf4a8eff92c9f260e364d2 ] If a GUID Partition Table claims to have more than 2**25 entries, the calculation of the partition table size in alloc_read_gpt_entries() will overflow a 32-bit integer and not enough space will be allocated for the table. Nothing seems to get written out of bounds, but later efi_partition() will read up to 32768 bytes from a 128 byte buffer, possibly OOPSing or exposing information to /proc/partitions and uevents. The problem exists on both 64-bit and 32-bit platforms. Fix the overflow and also print a meaningful debug message if the table size is too large. Signed-off-by: Alden Tondettar <alden.tondettar@gmail.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05Merge 4.4.90 into android-4.4Greg Kroah-Hartman
Changes in 4.4.90 cifs: release auth_key.response for reconnect. mac80211: flush hw_roc_start work before cancelling the ROC KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce() tracing: Fix trace_pipe behavior for instance traces tracing: Erase irqsoff trace with empty write md/raid5: fix a race condition in stripe batch md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly crypto: talitos - Don't provide setkey for non hmac hashing algs. crypto: talitos - fix sha224 KEYS: fix writing past end of user-supplied buffer in keyring_read() KEYS: prevent creating a different user's keyrings KEYS: prevent KEYCTL_READ on negative key powerpc/pseries: Fix parent_dn reference leak in add_dt_node() Fix SMB3.1.1 guest authentication to Samba SMB: Validate negotiate (to protect against downgrade) even if signing off SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags vfs: Return -ENXIO for negative SEEK_HOLE / SEEK_DATA offsets nl80211: check for the required netlink attributes presence bsg-lib: don't free job in bsg_prepare_job seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter() arm64: Make sure SPsel is always set arm64: fault: Route pte translation faults via do_translation_fault KVM: VMX: Do not BUG() on out-of-bounds guest IRQ kvm: nVMX: Don't allow L2 to access the hardware CR8 PCI: Fix race condition with driver_override btrfs: fix NULL pointer dereference from free_reloc_roots() btrfs: propagate error to btrfs_cmp_data_prepare caller btrfs: prevent to set invalid default subvolid x86/fpu: Don't let userspace set bogus xcomp_bv gfs2: Fix debugfs glocks dump timer/sysclt: Restrict timer migration sysctl values to 0 and 1 KVM: VMX: do not change SN bit in vmx_update_pi_irte() KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt cxl: Fix driver use count dmaengine: mmp-pdma: add number of requestors ARM: pxa: add the number of DMA requestor lines ARM: pxa: fix the number of DMA requestor lines KVM: VMX: use cmpxchg64 video: fbdev: aty: do not leak uninitialized padding in clk to userspace swiotlb-xen: implement xen_swiotlb_dma_mmap callback fix xen_swiotlb_dma_mmap prototype Linux 4.4.90 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-05bsg-lib: don't free job in bsg_prepare_jobChristoph Hellwig
commit f507b54dccfd8000c517d740bc45f20c74532d18 upstream. The job structure is allocated as part of the request, so we should not free it in the error path of bsg_prepare_job. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27Merge 4.4.89 into android-4.4Greg Kroah-Hartman
Changes in 4.4.89 ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt() ipv6: add rcu grace period before freeing fib6_node ipv6: fix sparse warning on rt6i_node qlge: avoid memcpy buffer overflow Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()" tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0 Revert "net: use lib/percpu_counter API for fragmentation mem accounting" Revert "net: fix percpu memory leaks" gianfar: Fix Tx flow control deactivation ipv6: fix memory leak with multiple tables during netns destruction ipv6: fix typo in fib6_net_exit() f2fs: check hot_data for roll-forward recovery x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps md/raid5: release/flush io in raid5_do_work() nfsd: Fix general protection fault in release_lock_stateid() mm: prevent double decrease of nr_reserved_highatomic tty: improve tty_insert_flip_char() fast path tty: improve tty_insert_flip_char() slow path tty: fix __tty_insert_flip_char regression Input: i8042 - add Gigabyte P57 to the keyboard reset table MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix quiet NaN propagation MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix cases of both inputs zero MIPS: math-emu: <MAX|MIN>.<D|S>: Fix cases of both inputs negative MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of input values with opposite signs MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of both infinite inputs MIPS: math-emu: MINA.<D|S>: Fix some cases of infinity and zero inputs crypto: AF_ALG - remove SGL terminator indicator when chaining ext4: fix incorrect quotaoff if the quota feature is enabled ext4: fix quota inconsistency during orphan cleanup for read-only mounts powerpc: Fix DAR reporting when alignment handler faults block: Relax a check in blk_start_queue() md/bitmap: disable bitmap_resize for file-backed bitmaps. skd: Avoid that module unloading triggers a use-after-free skd: Submit requests to firmware before triggering the doorbell scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA scsi: zfcp: fix missing trace records for early returns in TMF eh handlers scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response scsi: zfcp: trace high part of "new" 64 bit SCSI LUN scsi: megaraid_sas: Check valid aen class range to avoid kernel panic scsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE in case adapter is dead scsi: storvsc: fix memory leak on ring buffer busy scsi: sg: remove 'save_scat_len' scsi: sg: use standard lists for sg_requests scsi: sg: off by one in sg_ioctl() scsi: sg: factor out sg_fill_request_table() scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE scsi: qla2xxx: Fix an integer overflow in sysfs code ftrace: Fix selftest goto location on error tracing: Apply trace_clock changes to instance max buffer ARC: Re-enable MMU upon Machine Check exception PCI: shpchp: Enable bridge bus mastering if MSI is enabled media: v4l2-compat-ioctl32: Fix timespec conversion media: uvcvideo: Prevent heap overflow when accessing mapped controls bcache: initialize dirty stripes in flash_dev_run() bcache: Fix leak of bdev reference bcache: do not subtract sectors_to_gc for bypassed IO bcache: correct cache_dirty_target in __update_writeback_rate() bcache: Correct return value for sysfs attach errors bcache: fix for gc and write-back race bcache: fix bch_hprint crash and improve output ftrace: Fix memleak when unregistering dynamic ops when tracing disabled Linux 4.4.89 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-09-27block: Relax a check in blk_start_queue()Bart Van Assche
commit 4ddd56b003f251091a67c15ae3fe4a5c5c5e390a upstream. Calling blk_start_queue() from interrupt context with the queue lock held and without disabling IRQs, as the skd driver does, is safe. This patch avoids that loading the skd driver triggers the following warning: WARNING: CPU: 11 PID: 1348 at block/blk-core.c:283 blk_start_queue+0x84/0xa0 RIP: 0010:blk_start_queue+0x84/0xa0 Call Trace: skd_unquiesce_dev+0x12a/0x1d0 [skd] skd_complete_internal+0x1e7/0x5a0 [skd] skd_complete_other+0xc2/0xd0 [skd] skd_isr_completion_posted.isra.30+0x2a5/0x470 [skd] skd_isr+0x14f/0x180 [skd] irq_forced_thread_fn+0x2a/0x70 irq_thread+0x144/0x1a0 kthread+0x125/0x140 ret_from_fork+0x2a/0x40 Fixes: commit a038e2536472 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Andrew Morton <akpm@osdl.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05Merge 4.4.76 into android-4.4Greg Kroah-Hartman
Changes in 4.4.76 ipv6: release dst on error in ip6_dst_lookup_tail net: don't call strlen on non-terminated string in dev_set_alias() decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb net: Zero ifla_vf_info in rtnl_fill_vfinfo() af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers Fix an intermittent pr_emerg warning about lo becoming free. net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx igmp: acquire pmc lock for ip_mc_clear_src() igmp: add a missing spin_lock_init() ipv6: fix calling in6_ifa_hold incorrectly for dad work net/mlx5: Wait for FW readiness before initializing command interface decnet: always not take dst->__refcnt when inserting dst into hash table net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev sfc: provide dummy definitions of vswitch functions ipv6: Do not leak throw route references rtnetlink: add IFLA_GROUP to ifla_policy netfilter: xt_TCPMSS: add more sanity tests on tcph->doff netfilter: synproxy: fix conntrackd interaction NFSv4: fix a reference leak caused WARNING messages drm/ast: Handle configuration without P2A bridge mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff() MIPS: Avoid accidental raw backtrace MIPS: pm-cps: Drop manual cache-line alignment of ready_count MIPS: Fix IRQ tracing & lockdep when rescheduling ALSA: hda - Fix endless loop of codec configure ALSA: hda - set input_path bitmap to zero after moving it to new place drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr usb: gadget: f_fs: Fix possibe deadlock sysctl: enable strict writes block: fix module reference leak on put_disk() call for cgroups throttle mm: numa: avoid waiting on freed migrated pages KVM: x86: fix fixing of hypercalls scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type scsi: lpfc: Set elsiocb contexts to NULL after freeing it qla2xxx: Fix erroneous invalid handle message ARM: dts: BCM5301X: Correct GIC_PPI interrupt flags net: mvneta: Fix for_each_present_cpu usage MIPS: ath79: fix regression in PCI window initialization net: korina: Fix NAPI versus resources freeing MIPS: ralink: MT7688 pinmux fixes MIPS: ralink: fix USB frequency scaling MIPS: ralink: Fix invalid assignment of SoC type MIPS: ralink: fix MT7628 pinmux typos MIPS: ralink: fix MT7628 wled_an pinmux gpio mtd: bcm47xxpart: limit scanned flash area on BCM47XX (MIPS) only bgmac: fix a missing check for build_skb mtd: bcm47xxpart: don't fail because of bit-flips bgmac: Fix reversed test of build_skb() return value. net: bgmac: Fix SOF bit checking net: bgmac: Start transmit queue in bgmac_open net: bgmac: Remove superflous netif_carrier_on() powerpc/eeh: Enable IO path on permanent error gianfar: Do not reuse pages from emergency reserve Btrfs: fix truncate down when no_holes feature is enabled virtio_console: fix a crash in config_work_handler swiotlb-xen: update dev_addr after swapping pages xen-netfront: Fix Rx stall during network stress and OOM scsi: virtio_scsi: Reject commands when virtqueue is broken platform/x86: ideapad-laptop: handle ACPI event 1 amd-xgbe: Check xgbe_init() return code net: dsa: Check return value of phy_connect_direct() drm/amdgpu: check ring being ready before using vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null virtio_net: fix PAGE_SIZE > 64k vxlan: do not age static remote mac entries ibmveth: Add a proper check for the availability of the checksum features kernel/panic.c: add missing \n HID: i2c-hid: Add sleep between POWER ON and RESET scsi: lpfc: avoid double free of resource identifiers spi: davinci: use dma_mapping_error() mac80211: initialize SMPS field in HT capabilities x86/mpx: Use compatible types in comparison to fix sparse error coredump: Ensure proper size of sparse core files swiotlb: ensure that page-sized mappings are page-aligned s390/ctl_reg: make __ctl_load a full memory barrier be2net: fix status check in be_cmd_pmac_add() perf probe: Fix to show correct locations for events on modules net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV sctp: check af before verify address in sctp_addr_id2transport ravb: Fix use-after-free on `ifconfig eth0 down` jump label: fix passing kbuild_cflags when checking for asm goto support xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY xfrm: NULL dereference on allocation failure xfrm: Oops on error in pfkey_msg2xfrm_state() watchdog: bcm281xx: Fix use of uninitialized spinlock. sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation ARM: 8685/1: ensure memblock-limit is pmd-aligned x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space x86/mm: Fix flush_tlb_page() on Xen ocfs2: o2hb: revert hb threshold to keep compatible iommu/vt-d: Don't over-free page table directories iommu: Handle default domain attach failure iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid() cpufreq: s3c2416: double free on driver init error path KVM: x86: fix emulation of RSM and IRET instructions KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh() KVM: x86: zero base3 of unusable segments KVM: nVMX: Fix exception injection Linux 4.4.76 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-05block: fix module reference leak on put_disk() call for cgroups throttleRoman Pen
commit 39a169b62b415390398291080dafe63aec751e0a upstream. get_disk(),get_gendisk() calls have non explicit side effect: they increase the reference on the disk owner module. The following is the correct sequence how to get a disk reference and to put it: disk = get_gendisk(...); /* use disk */ owner = disk->fops->owner; put_disk(disk); module_put(owner); fs/block_dev.c is aware of this required module_put() call, but f.e. blkg_conf_finish(), which is located in block/blk-cgroup.c, does not put a module reference. To see a leakage in action cgroups throttle config can be used. In the following script I'm removing throttle for /dev/ram0 (actually this is NOP, because throttle was never set for this device): # lsmod | grep brd brd 5175 0 # i=100; while [ $i -gt 0 ]; do echo "1:0 0" > \ /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device; i=$(($i - 1)); \ done # lsmod | grep brd brd 5175 100 Now brd module has 100 references. The issue is fixed by calling module_put() just right away put_disk(). Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Cc: Gi-Oh Kim <gi-oh.kim@profitbricks.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-27Merge 4.4.73 into android-4.4Greg Kroah-Hartman
Changes in 4.4.73 s390/vmem: fix identity mapping partitions/msdos: FreeBSD UFS2 file systems are not recognized ARM: dts: imx6dl: Fix the VDD_ARM_CAP voltage for 396MHz operation staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory. Call echo service immediately after socket reconnect net: xilinx_emaclite: fix freezes due to unordered I/O net: xilinx_emaclite: fix receive buffer overflow ipv6: Handle IPv4-mapped src to in6addr_any dst. ipv6: Inhibit IPv4-mapped src address on the wire. NET: Fix /proc/net/arp for AX.25 NET: mkiss: Fix panic net: hns: Fix the device being used for dma mapping during TX sierra_net: Skip validating irrelevant fields for IDLE LSIs sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications i2c: piix4: Fix request_region size ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches PM / runtime: Avoid false-positive warnings from might_sleep_if() jump label: pass kbuild_cflags when checking for asm goto support kasan: respect /proc/sys/kernel/traceoff_on_warning log2: make order_base_2() behave correctly on const input value zero ethtool: do not vzalloc(0) on registers dump fscache: Fix dead object requeue fscache: Clear outstanding writes when disabling a cookie FS-Cache: Initialise stores_lock in netfs cookie ipv6: fix flow labels when the traffic class is non-0 drm/nouveau: prevent userspace from deleting client object drm/nouveau/fence/g84-: protect against concurrent access to semaphore buffers net/mlx4_core: Avoid command timeouts during VF driver device shutdown gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page pinctrl: berlin-bg4ct: fix the value for "sd1a" of pin SCRD0_CRD_PRES net: adaptec: starfire: add checks for dma mapping errors parisc, parport_gsc: Fixes for printk continuation lines drm/nouveau: Don't enabling polling twice on runtime resume drm/ast: Fixed system hanged if disable P2A ravb: unmap descriptors when freeing rings nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED" r8152: re-schedule napi for tx r8152: fix rtl8152_post_reset function r8152: avoid start_xmit to schedule napi when napi is disabled sctp: sctp_addr_id2transport should verify the addr before looking up assoc romfs: use different way to generate fsid for BLOCK or MTD proc: add a schedule point in proc_pid_readdir() tipc: ignore requests when the connection state is not CONNECTED xtensa: don't use linux IRQ #0 s390/kvm: do not rely on the ILC on kvm host protection fauls sparc64: make string buffers large enough Linux 4.4.73 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-06-17partitions/msdos: FreeBSD UFS2 file systems are not recognizedRichard
commit 223220356d5ebc05ead9a8d697abb0c0a906fc81 upstream. The code in block/partitions/msdos.c recognizes FreeBSD, OpenBSD and NetBSD partitions and does a reasonable job picking out OpenBSD and NetBSD UFS subpartitions. But for FreeBSD the subpartitions are always "bad". Kernel: <bsd:bad subpartition - ignored Though all 3 of these BSD systems use UFS as a file system, only FreeBSD uses relative start addresses in the subpartition declarations. The following patch fixes this for FreeBSD partitions and leaves the code for OpenBSD and NetBSD intact: Signed-off-by: Richard Narron <comet.berkeley@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-21Merge 4.4.69 into android-4.4Greg Kroah-Hartman
Changes in 4.4.69 xen: adjust early dom0 p2m handling to xen hypervisor behavior target: Fix compare_and_write_callback handling for non GOOD status target/fileio: Fix zero-length READ and WRITE handling target: Convert ACL change queue_depth se_session reference usage iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement usb: host: xhci: print correct command ring address USB: serial: ftdi_sio: add device ID for Microsemi/Arrow SF2PLUS Dev Kit USB: Proper handling of Race Condition when two USB class drivers try to call init_usb_class simultaneously staging: vt6656: use off stack for in buffer USB transfers. staging: vt6656: use off stack for out buffer USB transfers. staging: gdm724x: gdm_mux: fix use-after-free on module unload staging: comedi: jr3_pci: fix possible null pointer dereference staging: comedi: jr3_pci: cope with jiffies wraparound usb: misc: add missing continue in switch usb: Make sure usb/phy/of gets built-in usb: hub: Fix error loop seen after hub communication errors usb: hub: Do not attempt to autosuspend disconnected devices x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup selftests/x86/ldt_gdt_32: Work around a glibc sigaction() bug x86, pmem: Fix cache flushing for iovec write < 8 bytes um: Fix PTRACE_POKEUSER on x86_64 KVM: x86: fix user triggerable warning in kvm_apic_accept_events() KVM: arm/arm64: fix races in kvm_psci_vcpu_on block: fix blk_integrity_register to use template's interval_exp if not 0 crypto: algif_aead - Require setkey before accept(2) dm era: save spacemap metadata root after the pre-commit vfio/type1: Remove locked page accounting workqueue IB/core: Fix sysfs registration error flow IB/IPoIB: ibX: failed to create mcg debug file IB/mlx4: Fix ib device initialization error flow IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level ext4: evict inline data when writing to memory map fs/xattr.c: zero out memory copied to userspace in getxattr ceph: fix memory leak in __ceph_setxattr() fs/block_dev: always invalidate cleancache in invalidate_bdev() Set unicode flag on cifs echo request to avoid Mac error SMB3: Work around mount failure when using SMB3 dialect to Macs CIFS: fix mapping of SFM_SPACE and SFM_PERIOD cifs: fix CIFS_IOC_GET_MNT_INFO oops CIFS: add misssing SFM mapping for doublequote padata: free correct variable arm64: KVM: Fix decoding of Rt/Rt2 when trapping AArch32 CP accesses serial: samsung: Use right device for DMA-mapping calls serial: omap: fix runtime-pm handling on unbind serial: omap: suspend device on probe errors tty: pty: Fix ldisc flush after userspace become aware of the data already Bluetooth: Fix user channel for 32bit userspace on 64bit kernel Bluetooth: hci_bcm: add missing tty-device sanity check Bluetooth: hci_intel: add missing tty-device sanity check mac80211: pass RX aggregation window size to driver mac80211: pass block ack session timeout to to driver mac80211: RX BA support for sta max_rx_aggregation_subframes wlcore: Pass win_size taken from ieee80211_sta to FW wlcore: Add RX_BA_WIN_SIZE_CHANGE_EVENT event ipmi: Fix kernel panic at ipmi_ssif_thread() Linux 4.4.69 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-05-20block: fix blk_integrity_register to use template's interval_exp if not 0Mike Snitzer
commit 2859323e35ab5fc42f351fbda23ab544eaa85945 upstream. When registering an integrity profile: if the template's interval_exp is not 0 use it, otherwise use the ilog2() of logical block size of the provided gendisk. This fixes a long-standing DM linear target bug where it cannot pass integrity data to the underlying device if its logical block size conflicts with the underlying device's logical block size. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-15Merge 4.4.68 into android-4.4Greg Kroah-Hartman
Changes in 4.4.68 9p: fix a potential acl leak ARM: 8452/3: PJ4: make coprocessor access sequences buildable in Thumb2 mode cpupower: Fix turbo frequency reporting for pre-Sandy Bridge cores powerpc/powernv: Fix opal_exit tracepoint opcode power: supply: bq24190_charger: Fix irq trigger to IRQF_TRIGGER_FALLING power: supply: bq24190_charger: Call set_mode_host() on pm_resume() power: supply: bq24190_charger: Install irq_handler_thread() at end of probe() power: supply: bq24190_charger: Call power_supply_changed() for relevant component power: supply: bq24190_charger: Don't read fault register outside irq_handle_thread() power: supply: bq24190_charger: Handle fault before status on interrupt leds: ktd2692: avoid harmless maybe-uninitialized warning ARM: OMAP5 / DRA7: Fix HYP mode boot for thumb2 build mwifiex: debugfs: Fix (sometimes) off-by-1 SSID print mwifiex: remove redundant dma padding in AMSDU mwifiex: Avoid skipping WEP key deletion for AP x86/ioapic: Restore IO-APIC irq_chip retrigger callback x86/pci-calgary: Fix iommu_free() comparison of unsigned expression >= 0 clk: Make x86/ conditional on CONFIG_COMMON_CLK kprobes/x86: Fix kernel panic when certain exception-handling addresses are probed x86/platform/intel-mid: Correct MSI IRQ line for watchdog device Revert "KVM: nested VMX: disable perf cpuid reporting" KVM: nVMX: initialize PML fields in vmcs02 KVM: nVMX: do not leak PML full vmexit to L1 usb: host: ehci-exynos: Decrese node refcount on exynos_ehci_get_phy() error paths usb: host: ohci-exynos: Decrese node refcount on exynos_ehci_get_phy() error paths usb: chipidea: Only read/write OTGSC from one place usb: chipidea: Handle extcon events properly USB: serial: keyspan_pda: fix receive sanity checks USB: serial: digi_acceleport: fix incomplete rx sanity check USB: serial: ssu100: fix control-message error handling USB: serial: io_edgeport: fix epic-descriptor handling USB: serial: ti_usb_3410_5052: fix control-message error handling USB: serial: ark3116: fix open error handling USB: serial: ftdi_sio: fix latency-timer error handling USB: serial: quatech2: fix control-message error handling USB: serial: mct_u232: fix modem-status error handling USB: serial: io_edgeport: fix descriptor error handling phy: qcom-usb-hs: Add depends on EXTCON serial: 8250_omap: Fix probe and remove for PM runtime scsi: mac_scsi: Fix MAC_SCSI=m option when SCSI=m MIPS: R2-on-R6 MULTU/MADDU/MSUBU emulation bugfix brcmfmac: Ensure pointer correctly set if skb data location changes brcmfmac: Make skb header writable before use staging: wlan-ng: add missing byte order conversion staging: emxx_udc: remove incorrect __init annotations ALSA: hda - Fix deadlock of controller device lock at unbinding tcp: do not underestimate skb->truesize in tcp_trim_head() bpf, arm64: fix jit branch offset related to ldimm64 tcp: fix wraparound issue in tcp_lp tcp: do not inherit fastopen_req from parent ipv4, ipv6: ensure raw socket message is big enough to hold an IP header rtnetlink: NUL-terminate IFLA_PHYS_PORT_NAME string ipv6: initialize route null entry in addrconf_init() ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf bnxt_en: allocate enough space for ->ntp_fltr_bmap f2fs: sanity check segment count drm/ttm: fix use-after-free races in vm fault handling block: get rid of blk_integrity_revalidate() Linux 4.4.68 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-05-14block: get rid of blk_integrity_revalidate()Ilya Dryomov
commit 19b7ccf8651df09d274671b53039c672a52ad84d upstream. Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk") introduced blk_integrity_revalidate(), which seems to assume ownership of the stable pages flag and unilaterally clears it if no blk_integrity profile is registered: if (bi->profile) disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES; else disk->queue->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES; It's called from revalidate_disk() and rescan_partitions(), making it impossible to enable stable pages for drivers that support partitions and don't use blk_integrity: while the call in revalidate_disk() can be trivially worked around (see zram, which doesn't support partitions and hence gets away with zram_revalidate_disk()), rescan_partitions() can be triggered from userspace at any time. This breaks rbd, where the ceph messenger is responsible for generating/verifying CRCs. Since blk_integrity_{un,}register() "must" be used for (un)registering the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES setting there. This way drivers that call blk_integrity_register() and use integrity infrastructure won't interfere with drivers that don't but still want stable pages. Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk") Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> [idryomov@gmail.com: backport to < 4.11: bdi is embedded in queue] Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-27Merge 4.4.64 into android-4.4Greg Kroah-Hartman
Changes in 4.4.64: KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings KEYS: Change the name of the dead type to ".dead" to prevent user access KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings tracing: Allocate the snapshot buffer before enabling probe ring-buffer: Have ring_buffer_iter_empty() return true when empty cifs: Do not send echoes before Negotiate is complete CIFS: remove bad_network_name flag s390/mm: fix CMMA vs KSM vs others Drivers: hv: don't leak memory in vmbus_establish_gpadl() Drivers: hv: get rid of timeout in vmbus_open() Drivers: hv: vmbus: Reduce the delay between retries in vmbus_post_msg() VSOCK: Detach QP check should filter out non matching QPs. Input: elantech - add Fujitsu Lifebook E547 to force crc_enabled ACPI / power: Avoid maybe-uninitialized warning mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card mac80211: reject ToDS broadcast data frames ubi/upd: Always flush after prepared for an update powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd Tools: hv: kvp: ensure kvp device fd is closed on exec Drivers: hv: balloon: keep track of where ha_region starts Drivers: hv: balloon: account for gaps in hot add regions hv: don't reset hv_context.tsc_page on crash x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions block: fix del_gendisk() vs blkdev_ioctl crash tipc: fix crash during node removal Linux 4.4.64 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-04-27block: fix del_gendisk() vs blkdev_ioctl crashDan Williams
commit ac34f15e0c6d2fd58480052b6985f6991fb53bcc upstream. When tearing down a block device early in its lifetime, userspace may still be performing discovery actions like blkdev_ioctl() to re-read partitions. The nvdimm_revalidate_disk() implementation depends on disk->driverfs_dev to be valid at entry. However, it is set to NULL in del_gendisk() and fatally this is happening *before* the disk device is deleted from userspace view. There's no reason for del_gendisk() to clear ->driverfs_dev. That device is the parent of the disk. It is guaranteed to not be freed until the disk, as a child, drops its ->parent reference. We could also fix this issue locally in nvdimm_revalidate_disk() by using disk_to_dev(disk)->parent, but lets fix it globally since ->driverfs_dev follows the lifetime of the parent. Longer term we should probably just add a @parent parameter to add_disk(), and stop carrying this pointer in the gendisk. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa00340a8>] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm] CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G O 4.4.0-rc5 #2257 [..] Call Trace: [<ffffffff8143e5c7>] rescan_partitions+0x87/0x2c0 [<ffffffff810f37f9>] ? __lock_is_held+0x49/0x70 [<ffffffff81438c62>] __blkdev_reread_part+0x72/0xb0 [<ffffffff81438cc5>] blkdev_reread_part+0x25/0x40 [<ffffffff8143982d>] blkdev_ioctl+0x4fd/0x9c0 [<ffffffff811246c9>] ? current_kernel_time64+0x69/0xd0 [<ffffffff812916dd>] block_ioctl+0x3d/0x50 [<ffffffff81264c38>] do_vfs_ioctl+0x308/0x560 [<ffffffff8115dbd1>] ? __audit_syscall_entry+0xb1/0x100 [<ffffffff810031d6>] ? do_audit_syscall_entry+0x66/0x70 [<ffffffff81264f09>] SyS_ioctl+0x79/0x90 [<ffffffff81902672>] entry_SYSCALL_64_fastpath+0x12/0x76 Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@fb.com> Reported-by: Robert Hu <robert.hu@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-18Merge 4.4.62 into android-4.4Greg Kroah-Hartman
Changes in 4.4.62: drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3 drm/i915: Stop using RP_DOWN_EI on Baytrail usb: dwc3: gadget: delay unmap of bounced requests mtd: bcm47xxpart: fix parsing first block after aligned TRX MIPS: Introduce irq_stack MIPS: Stack unwinding while on IRQ stack MIPS: Only change $28 to thread_info if coming from user mode MIPS: Switch to the irq_stack in interrupts MIPS: Select HAVE_IRQ_EXIT_ON_IRQ_STACK MIPS: IRQ Stack: Fix erroneous jal to plat_irq_dispatch crypto: caam - fix RNG deinstantiation error checking net/packet: fix overflow in check for priv area size blk-mq: Avoid memory reclaim when remapping queues usb: hub: Wait for connection to be reestablished after port reset net/mlx4_en: Fix bad WQE issue net/mlx4_core: Fix racy CQ (Completion Queue) free net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions ibmveth: set correct gso_size and gso_type Linux 4.4.62 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-04-18blk-mq: Avoid memory reclaim when remapping queuesGabriel Krisman Bertazi
commit 36e1f3d107867b25c616c2fd294f5a1c9d4e5d09 upstream. While stressing memory and IO at the same time we changed SMT settings, we were able to consistently trigger deadlocks in the mm system, which froze the entire machine. I think that under memory stress conditions, the large allocations performed by blk_mq_init_rq_map may trigger a reclaim, which stalls waiting on the block layer remmaping completion, thus deadlocking the system. The trace below was collected after the machine stalled, waiting for the hotplug event completion. The simplest fix for this is to make allocations in this path non-reclaimable, with GFP_NOIO. With this patch, We couldn't hit the issue anymore. This should apply on top of Jens's for-next branch cleanly. Changes since v1: - Use GFP_NOIO instead of GFP_NOWAIT. Call Trace: [c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable) [c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430 [c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0 [c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0 [c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30 [c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0 [c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0 [c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs] [c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs] [c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs] [c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210 [c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0 [c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0 [c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520 [c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250 [c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0 [c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380 [c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360 [c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0 [c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250 [c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100 [c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0 [c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0 [c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250 [c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120 [c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0 [c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150 [c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0 [c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120 [c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0 [c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0 [c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0 [c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250 [c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0 [c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270 [c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110 [c000000f0160be30] [c000000000009204] system_call+0x38/0xec Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Douglas Miller <dougmill@linux.vnet.ibm.com> Cc: linux-block@vger.kernel.org Cc: linux-scsi@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-11Merge 4.4.60 into android-4.4Xin Li
Changes in 4.4.60: libceph: force GFP_NOIO for socket allocations xen/setup: Don't relocate p2m over existing one scsi: mpt3sas: fix hang on ata passthrough commands scsi: sg: check length passed to SG_NEXT_CMD_LEN scsi: libsas: fix ata xfer length ALSA: seq: Fix race during FIFO resize ALSA: hda - fix a problem for lineout on a Dell AIO machine ASoC: atmel-classd: fix audio clock rate ACPI: Fix incompatibility with mcount-based function graph tracing ACPI: Do not create a platform_device for IOAPIC/IOxAPIC tty/serial: atmel: fix race condition (TX+DMA) tty/serial: atmel: fix TX path in atmel_console_write() USB: fix linked-list corruption in rh_call_control() KVM: x86: clear bus pointer when destroyed drm/radeon: Override fpfn for all VRAM placements in radeon_evict_flags mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd() MIPS: Lantiq: Fix cascaded IRQ setup rtc: s35390a: fix reading out alarm rtc: s35390a: make sure all members in the output are set rtc: s35390a: implement reset routine as suggested by the reference rtc: s35390a: improve irq handling KVM: kvm_io_bus_unregister_dev() should never fail power: reset: at91-poweroff: timely shutdown LPDDR memories blk: improve order of bio handling in generic_make_request() blk: Ensure users for current->bio_list can see the full list. padata: avoid race in reordering Linux 4.4.60 Change-Id: I705c78ccae62ca59f922164085e7ca03ad4ecc6b Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-04-08blk: Ensure users for current->bio_list can see the full list.NeilBrown
commit f5fe1b51905df7cfe4fdfd85c5fb7bc5b71a094f upstream. Commit 79bd99596b73 ("blk: improve order of bio handling in generic_make_request()") changed current->bio_list so that it did not contain *all* of the queued bios, but only those submitted by the currently running make_request_fn. There are two places which walk the list and requeue selected bios, and others that check if the list is empty. These are no longer correct. So redefine current->bio_list to point to an array of two lists, which contain all queued bios, and adjust various code to test or walk both lists. Signed-off-by: NeilBrown <neilb@suse.com> Fixes: 79bd99596b73 ("blk: improve order of bio handling in generic_make_request()") Signed-off-by: Jens Axboe <axboe@fb.com> [jwang: backport to 4.4] Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Restore changes in device-mapper from upstream version] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
2017-04-08blk: improve order of bio handling in generic_make_request()NeilBrown
commit 79bd99596b7305ab08109a8bf44a6a4511dbf1cd upstream. To avoid recursion on the kernel stack when stacked block devices are in use, generic_make_request() will, when called recursively, queue new requests for later handling. They will be handled when the make_request_fn for the current bio completes. If any bios are submitted by a make_request_fn, these will ultimately be handled seqeuntially. If the handling of one of those generates further requests, they will be added to the end of the queue. This strict first-in-first-out behaviour can lead to deadlocks in various ways, normally because a request might need to wait for a previous request to the same device to complete. This can happen when they share a mempool, and can happen due to interdependencies particular to the device. Both md and dm have examples where this happens. These deadlocks can be erradicated by more selective ordering of bios. Specifically by handling them in depth-first order. That is: when the handling of one bio generates one or more further bios, they are handled immediately after the parent, before any siblings of the parent. That way, when generic_make_request() calls make_request_fn for some particular device, we can be certain that all previously submited requests for that device have been completely handled and are not waiting for anything in the queue of requests maintained in generic_make_request(). An easy way to achieve this would be to use a last-in-first-out stack instead of a queue. However this will change the order of consecutive bios submitted by a make_request_fn, which could have unexpected consequences. Instead we take a slightly more complex approach. A fresh queue is created for each call to a make_request_fn. After it completes, any bios for a different device are placed on the front of the main queue, followed by any bios for the same device, followed by all bios that were already on the queue before the make_request_fn was called. This provides the depth-first approach without reordering bios on the same level. This, by itself, it not enough to remove all deadlocks. It just makes it possible for drivers to take the extra step required themselves. To avoid deadlocks, drivers must never risk waiting for a request after submitting one to generic_make_request. This includes never allocing from a mempool twice in the one call to a make_request_fn. A common pattern in drivers is to call bio_split() in a loop, handling the first part and then looping around to possibly split the next part. Instead, a driver that finds it needs to split a bio should queue (with generic_make_request) the second part, handle the first part, and then return. The new code in generic_make_request will ensure the requests to underlying bios are processed first, then the second bio that was split off. If it splits again, the same process happens. In each case one bio will be completely handled before the next one is attempted. With this is place, it should be possible to disable the punt_bios_to_recover() recovery thread for many block devices, and eventually it may be possible to remove it completely. Ref: http://www.spinics.net/lists/raid/msg54680.html Tested-by: Jinpu Wang <jinpu.wang@profitbricks.com> Inspired-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com> [jwang: backport to 4.4] Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30Merge 4.4.48 into android-4.4Greg Kroah-Hartman
Changes in 4.4.48: net/openvswitch: Set the ipv6 source tunnel key address attribute correctly net: bcmgenet: Do not suspend PHY if Wake-on-LAN is enabled net: properly release sk_frag.page amd-xgbe: Fix jumbo MTU processing on newer hardware net: unix: properly re-increment inflight counter of GC discarded candidates net/mlx5: Increase number of max QPs in default profile net/mlx5e: Count LRO packets correctly net: bcmgenet: remove bcmgenet_internal_phy_setup() ipv4: provide stronger user input validation in nl_fib_input() socket, bpf: fix sk_filter use after free in sk_clone_lock tcp: initialize icsk_ack.lrcvtime at session start time Input: elan_i2c - add ASUS EeeBook X205TA special touchpad fw Input: i8042 - add noloop quirk for Dell Embedded Box PC 3000 Input: iforce - validate number of endpoints before using them Input: ims-pcu - validate number of endpoints before using them Input: hanwang - validate number of endpoints before using them Input: yealink - validate number of endpoints before using them Input: cm109 - validate number of endpoints before using them Input: kbtab - validate number of endpoints before using them Input: sur40 - validate number of endpoints before using them ALSA: seq: Fix racy cell insertions during snd_seq_pool_done() ALSA: ctxfi: Fix the incorrect check of dma_set_mask() call ALSA: hda - Adding a group of pin definition to fix headset problem USB: serial: option: add Quectel UC15, UC20, EC21, and EC25 modems USB: serial: qcserial: add Dell DW5811e ACM gadget: fix endianness in notifications usb: gadget: f_uvc: Fix SuperSpeed companion descriptor's wBytesPerInterval usb-core: Add LINEAR_FRAME_INTR_BINTERVAL USB quirk USB: uss720: fix NULL-deref at probe USB: lvtest: fix NULL-deref at probe USB: idmouse: fix NULL-deref at probe USB: wusbcore: fix NULL-deref at probe usb: musb: cppi41: don't check early-TX-interrupt for Isoch transfer usb: hub: Fix crash after failure to read BOS descriptor uwb: i1480-dfu: fix NULL-deref at probe uwb: hwa-rc: fix NULL-deref at probe mmc: ushc: fix NULL-deref at probe iio: adc: ti_am335x_adc: fix fifo overrun recovery iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3 parport: fix attempt to write duplicate procfiles ext4: mark inode dirty after converting inline directory mmc: sdhci: Do not disable interrupts while waiting for clock xen/acpi: upload PM state from init-domain to Xen iommu/vt-d: Fix NULL pointer dereference in device_to_iommu ARM: at91: pm: cpu_idle: switch DDR to power-down mode ARM: dts: at91: sama5d2: add dma properties to UART nodes cpufreq: Restore policy min/max limits on CPU online raid10: increment write counter after bio is split libceph: don't set weight to IN when OSD is destroyed xfs: don't allow di_size with high bit set xfs: fix up xfs_swap_extent_forks inline extent handling nl80211: fix dumpit error path RTNL deadlocks USB: usbtmc: add missing endpoint sanity check xfs: clear _XBF_PAGES from buffers when readahead page xen: do not re-use pirq number cached in pci device msi msg data igb: Workaround for igb i210 firmware issue igb: add i211 to i210 PHY workaround x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic PCI: Separate VF BAR updates from standard BAR updates PCI: Remove pci_resource_bar() and pci_iov_resource_bar() PCI: Add comments about ROM BAR updating PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE PCI: Don't update VF BARs while VF memory space is enabled PCI: Update BARs using property bits appropriate for type PCI: Ignore BAR updates on virtual functions PCI: Do any VF BAR updates before enabling the BARs vfio/spapr: Postpone allocation of userspace version of TCE table block: allow WRITE_SAME commands with the SG_IO ioctl s390/zcrypt: Introduce CEX6 toleration uvcvideo: uvc_scan_fallback() for webcams with broken chain ACPI / blacklist: add _REV quirks for Dell Precision 5520 and 3520 ACPI / blacklist: Make Dell Latitude 3350 ethernet work serial: 8250_pci: Detach low-level driver during PCI error recovery fbcon: Fix vc attr at deinit crypto: algif_hash - avoid zero-sized array Linux 4.4.58 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-03-30block: allow WRITE_SAME commands with the SG_IO ioctlSumit Semwal
From: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> [ Upstream commit 25cdb64510644f3e854d502d69c73f21c6df88a9 ] The WRITE_SAME commands are not present in the blk_default_cmd_filter write_ok list, and thus are failed with -EPERM when the SG_IO ioctl() is executed without CAP_SYS_RAWIO capability (e.g., unprivileged users). [ sg_io() -> blk_fill_sghdr_rq() > blk_verify_command() -> -EPERM ] The problem can be reproduced with the sg_write_same command # sg_write_same --num 1 --xferlen 512 /dev/sda # # capsh --drop=cap_sys_rawio -- -c \ 'sg_write_same --num 1 --xferlen 512 /dev/sda' Write same: pass through os error: Operation not permitted # For comparison, the WRITE_VERIFY command does not observe this problem, since it is in that list: # capsh --drop=cap_sys_rawio -- -c \ 'sg_write_verify --num 1 --ilen 512 --lba 0 /dev/sda' # So, this patch adds the WRITE_SAME commands to the list, in order for the SG_IO ioctl to finish successfully: # capsh --drop=cap_sys_rawio -- -c \ 'sg_write_same --num 1 --xferlen 512 /dev/sda' # That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2]), which employs the SG_IO ioctl() and runs as an unprivileged user (libvirt-qemu). In that scenario, when a filesystem (e.g., ext4) performs its zero-out calls, which are translated to write-same calls in the guest kernel, and then into SG_IO ioctls to the host kernel, SCSI I/O errors may be observed in the guest: [...] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [...] sd 0:0:0:0: [sda] tag#0 Sense Key : Aborted Command [current] [...] sd 0:0:0:0: [sda] tag#0 Add. Sense: I/O process terminated [...] sd 0:0:0:0: [sda] tag#0 CDB: Write Same(10) 41 00 01 04 e0 78 00 00 08 00 [...] blk_update_request: I/O error, dev sda, sector 17096824 Links: [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52 [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device') Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Brahadambal Srinivasan <latha@linux.vnet.ibm.com> Reported-by: Manjunatha H R <manjuhr1@in.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-02Merge branch 'upstream-linux-4.4.y' into android-4.4Todd Kjos
2017-02-26blk-mq: really fix plug list flushing for nomerge queuesOmar Sandoval
commit 87c279e613f848c691111b29d49de8df3f4f56da upstream. Commit 0809e3ac6231 ("block: fix plug list flushing for nomerge queues") updated blk_mq_make_request() to set request_count even when blk_queue_nomerges() returns true. However, blk_mq_make_request() only does limited plugging and doesn't use request_count; blk_sq_make_request() is the one that should have been fixed. Do that and get rid of the unnecessary work in the mq version. Fixes: 0809e3ac6231 ("block: fix plug list flushing for nomerge queues") Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26Merge tag 'v4.4.45' into android-4.4.yDmitry Shmidt
This is the 4.4.45 stable release
2017-01-19blk-mq: Always schedule hctx->next_cpuGabriel Krisman Bertazi
commit c02ebfdddbafa9a6a0f52fbd715e6bfa229af9d3 upstream. Commit 0e87e58bf60e ("blk-mq: improve warning for running a queue on the wrong CPU") attempts to avoid triggering the WARN_ON in __blk_mq_run_hw_queue when the expected CPU is dead. Problem is, in the last batch execution before round robin, blk_mq_hctx_next_cpu can schedule a dead CPU and also update next_cpu to the next alive CPU in the mask, which will trigger the WARN_ON despite the previous workaround. The following patch fixes this scenario by always scheduling the value in hctx->next_cpu. This changes the moment when we round-robin the CPU running the hctx, but it really doesn't matter, since it still executes BLK_MQ_CPU_WORK_BATCH times in a row before switching to another CPU. Fixes: 0e87e58bf60e ("blk-mq: improve warning for running a queue on the wrong CPU") Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19block: cfq_cpd_alloc() should use @gfpTejun Heo
commit ebc4ff661fbe76781c6b16dfb7b754a5d5073f8e upstream. cfq_cpd_alloc() which is the cpd_alloc_fn implementation for cfq was incorrectly hard coding GFP_KERNEL instead of using the mask specified through the @gfp parameter. This currently doesn't cause any actual issues because all current callers specify GFP_KERNEL. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: e4a9bde9589f ("blkcg: replace blkcg_policy->cpd_size with ->cpd_alloc/free_fn() methods") Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-17Merge tag 'v4.4.43' into android-4.4.yDmitry Shmidt
This is the 4.4.43 stable release
2017-01-09Merge tag 'v4.4.40' into android-4.4.yDmitry Shmidt
This is the 4.4.40 stable release
2017-01-09sg_write()/bsg_write() is not fit to be called under KERNEL_DSAl Viro
commit 128394eff343fc6d2f32172f03e24829539c5835 upstream. Both damn things interpret userland pointers embedded into the payload; worse, they are actually traversing those. Leaving aside the bad API design, this is very much _not_ safe to call with KERNEL_DS. Bail out early if that happens. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-06blk-mq: Do not invoke .queue_rq() for a stopped queueBart Van Assche
commit bc27c01b5c46d3bfec42c96537c7a3fae0bb2cc4 upstream. The meaning of the BLK_MQ_S_STOPPED flag is "do not call .queue_rq()". Hence modify blk_mq_make_request() such that requests are queued instead of issued if a queue has been stopped. Reported-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-12Merge tag 'v4.4.38' into android-4.4.yDmitry Shmidt
This is the 4.4.38 stable release
2016-12-10Don't feed anything but regular iovec's to blk_rq_map_user_iovLinus Torvalds
commit a0ac402cfcdc904f9772e1762b3fda112dcc56a0 upstream. In theory we could map other things, but there's a reason that function is called "user_iov". Using anything else (like splice can do) just confuses it. Reported-and-tested-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28Merge tag 'v4.4.28' into android-4.4.yDmitry Shmidt
This is the 4.4.28 stable release
2016-10-28blkcg: Unlock blkcg_pol_mutex only once when cpd == NULLBart Van Assche
commit bbb427e342495df1cda10051d0566388697499c0 upstream. Unlocking a mutex twice is wrong. Hence modify blkcg_policy_register() such that blkcg_pol_mutex is unlocked once if cpd == NULL. This patch avoids that smatch reports the following error: block/blk-cgroup.c:1378: blkcg_policy_register() error: double unlock 'mutex:&blkcg_pol_mutex' Fixes: 06b285bd1125 ("blkcg: fix blkcg_policy_data allocation bug") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-27Merge tag 'v4.4.27' into android-4.4.yDmitry Shmidt
This is the 4.4.27 stable release
2016-10-22cfq: fix starvation of asynchronous writesGlauber Costa
commit 3932a86b4b9d1f0b049d64d4591ce58ad18b44ec upstream. While debugging timeouts happening in my application workload (ScyllaDB), I have observed calls to open() taking a long time, ranging everywhere from 2 seconds - the first ones that are enough to time out my application - to more than 30 seconds. The problem seems to happen because XFS may block on pending metadata updates under certain circumnstances, and that's confirmed with the following backtrace taken by the offcputime tool (iovisor/bcc): ffffffffb90c57b1 finish_task_switch ffffffffb97dffb5 schedule ffffffffb97e310c schedule_timeout ffffffffb97e1f12 __down ffffffffb90ea821 down ffffffffc046a9dc xfs_buf_lock ffffffffc046abfb _xfs_buf_find ffffffffc046ae4a xfs_buf_get_map ffffffffc046babd xfs_buf_read_map ffffffffc0499931 xfs_trans_read_buf_map ffffffffc044a561 xfs_da_read_buf ffffffffc0451390 xfs_dir3_leaf_read.constprop.16 ffffffffc0452b90 xfs_dir2_leaf_lookup_int ffffffffc0452e0f xfs_dir2_leaf_lookup ffffffffc044d9d3 xfs_dir_lookup ffffffffc047d1d9 xfs_lookup ffffffffc0479e53 xfs_vn_lookup ffffffffb925347a path_openat ffffffffb9254a71 do_filp_open ffffffffb9242a94 do_sys_open ffffffffb9242b9e sys_open ffffffffb97e42b2 entry_SYSCALL_64_fastpath 00007fb0698162ed [unknown] Inspecting my run with blktrace, I can see that the xfsaild kthread exhibit very high "Dispatch wait" times, on the dozens of seconds range and consistent with the open() times I have saw in that run. Still from the blktrace output, we can after searching a bit, identify the request that wasn't dispatched: 8,0 11 152 81.092472813 804 A WM 141698288 + 8 <- (8,1) 141696240 8,0 11 153 81.092472889 804 Q WM 141698288 + 8 [xfsaild/sda1] 8,0 11 154 81.092473207 804 G WM 141698288 + 8 [xfsaild/sda1] 8,0 11 206 81.092496118 804 I WM 141698288 + 8 ( 22911) [xfsaild/sda1] <==== 'I' means Inserted (into the IO scheduler) ===================================> 8,0 0 289372 96.718761435 0 D WM 141698288 + 8 (15626265317) [swapper/0] <==== Only 15s later the CFQ scheduler dispatches the request ======================> As we can see above, in this particular example CFQ took 15 seconds to dispatch this request. Going back to the full trace, we can see that the xfsaild queue had plenty of opportunity to run, and it was selected as the active queue many times. It would just always be preempted by something else (example): 8,0 1 0 81.117912979 0 m N cfq1618SN / insert_request 8,0 1 0 81.117913419 0 m N cfq1618SN / add_to_rr 8,0 1 0 81.117914044 0 m N cfq1618SN / preempt 8,0 1 0 81.117914398 0 m N cfq767A / slice expired t=1 8,0 1 0 81.117914755 0 m N cfq767A / resid=40 8,0 1 0 81.117915340 0 m N / served: vt=1948520448 min_vt=1948520448 8,0 1 0 81.117915858 0 m N cfq767A / sl_used=1 disp=0 charge=0 iops=1 sect=0 where cfq767 is the xfsaild queue and cfq1618 corresponds to one of the ScyllaDB IO dispatchers. The requests preempting the xfsaild queue are synchronous requests. That's a characteristic of ScyllaDB workloads, as we only ever issue O_DIRECT requests. While it can be argued that preempting ASYNC requests in favor of SYNC is part of the CFQ logic, I don't believe that doing so for 15+ seconds is anyone's goal. Moreover, unless I am misunderstanding something, that breaks the expectation set by the "fifo_expire_async" tunable, which in my system is set to the default. Looking at the code, it seems to me that the issue is that after we make an async queue active, there is no guarantee that it will execute any request. When the queue itself tests if it cfq_may_dispatch() it can bail if it sees SYNC requests in flight. An incoming request from another queue can also preempt it in such situation before we have the chance to execute anything (as seen in the trace above). This patch sets the must_dispatch flag if we notice that we have requests that are already fifo_expired. This flag is always cleared after cfq_dispatch_request() returns from cfq_dispatch_requests(), so it won't pin the queue for subsequent requests (unless they are themselves expired) Care is taken during preempt to still allow rt requests to preempt us regardless. Testing my workload with this patch applied produces much better results. From the application side I see no timeouts, and the open() latency histogram generated by systemtap looks much better, with the worst outlier at 131ms: Latency histogram of xfs_buf_lock acquisition (microseconds): value |-------------------------------------------------- count 0 | 11 1 |@@@@ 161 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1966 4 |@ 54 8 | 36 16 | 7 32 | 0 64 | 0 ~ 1024 | 0 2048 | 0 4096 | 1 8192 | 1 16384 | 2 32768 | 0 65536 | 0 131072 | 1 262144 | 0 524288 | 0 Signed-off-by: Glauber Costa <glauber@scylladb.com> CC: Jens Axboe <axboe@kernel.dk> CC: linux-block@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Glauber Costa <glauber@scylladb.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-14Merge tag 'v4.4.24' into android-4.4.yDmitry Shmidt
This is the 4.4.24 stable release
2016-10-07blk-mq: actually hook up defer list when running requestsOmar Sandoval
commit 52b9c330c6a8a4b5a1819bdaddf4ec76ab571e81 upstream. If ->queue_rq() returns BLK_MQ_RQ_QUEUE_OK, we use continue and skip over the rest of the loop body. However, dptr is assigned later in the loop body, and the BLK_MQ_RQ_QUEUE_OK case is exactly the case that we'd want it for. NVMe isn't actually using BLK_MQ_F_DEFER_ISSUE yet, nor is any other in-tree driver, but if the code's going to be there, it might as well work. Fixes: 74c450521dd8 ("blk-mq: add a 'list' parameter to ->queue_rq()") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-16Merge tag 'v4.4.21' into android-4.4.yDmitry Shmidt
This is the 4.4.21 stable release Change-Id: I03e47d6fdca8084641c4b4f9658ea0b0edb8f297
2016-09-15block: make sure a big bio is split into at most 256 bvecsMing Lei
commit 4d70dca4eadf2f95abe389116ac02b8439c2d16c upstream. After arbitrary bio size was introduced, the incoming bio may be very big. We have to split the bio into small bios so that each holds at most BIO_MAX_PAGES bvecs for safety reason, such as bio_clone(). This patch fixes the following kernel crash: > [ 172.660142] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 > [ 172.660229] IP: [<ffffffff811e53b4>] bio_trim+0xf/0x2a > [ 172.660289] PGD 7faf3e067 PUD 7f9279067 PMD 0 > [ 172.660399] Oops: 0000 [#1] SMP > [...] > [ 172.664780] Call Trace: > [ 172.664813] [<ffffffffa007f3be>] ? raid1_make_request+0x2e8/0xad7 [raid1] > [ 172.664846] [<ffffffff811f07da>] ? blk_queue_split+0x377/0x3d4 > [ 172.664880] [<ffffffffa005fb5f>] ? md_make_request+0xf6/0x1e9 [md_mod] > [ 172.664912] [<ffffffff811eb860>] ? generic_make_request+0xb5/0x155 > [ 172.664947] [<ffffffffa0445c89>] ? prio_io+0x85/0x95 [bcache] > [ 172.664981] [<ffffffffa0448252>] ? register_cache_set+0x355/0x8d0 [bcache] > [ 172.665016] [<ffffffffa04497d3>] ? register_bcache+0x1006/0x1174 [bcache] The issue can be reproduced by the following steps: - create one raid1 over two virtio-blk - build bcache device over the above raid1 and another cache device and bucket size is set as 2Mbytes - set cache mode as writeback - run random write over ext4 on the bcache device Fixes: 54efd50(block: make generic_make_request handle arbitrarily sized bios) Reported-by: Sebastian Roesner <sroesner-kernelorg@roesner-online.de> Reported-by: Eric Wheeler <bcache@lists.ewheeler.net> Cc: Shaohua Li <shli@fb.com> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15block: Fix race triggered by blk_set_queue_dying()Bart Van Assche
commit 1b856086813be9371929b6cc62045f9fd470f5a0 upstream. blk_set_queue_dying() can be called while another thread is submitting I/O or changing queue flags, e.g. through dm_stop_queue(). Hence protect the QUEUE_FLAG_DYING flag change with locking. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15blk-mq: End unstarted requests on dying queueKeith Busch
[ Upstream commit a59e0f5795fe52dad42a99c00287e3766153b312 ] Go directly to ending a request if it wasn't started. Previously, completing a request may invoke a driver callback for a request it didn't initialize. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-08Merge remote-tracking branch 'common/android-4.4' into android-4.4.yDmitry Shmidt
2016-09-08UPSTREAM: block: fix use-after-free in sys_ioprio_get()Omar Sandoval
(cherry picked from commit 8ba8682107ee2ca3347354e018865d8e1967c5f4) get_task_ioprio() accesses the task->io_context without holding the task lock and thus can race with exit_io_context(), leading to a use-after-free. The reproducer below hits this within a few seconds on my 4-core QEMU VM: #define _GNU_SOURCE #include <assert.h> #include <unistd.h> #include <sys/syscall.h> #include <sys/wait.h> int main(int argc, char **argv) { pid_t pid, child; long nproc, i; /* ioprio_set(IOPRIO_WHO_PROCESS, 0, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0)); */ syscall(SYS_ioprio_set, 1, 0, 0x6000); nproc = sysconf(_SC_NPROCESSORS_ONLN); for (i = 0; i < nproc; i++) { pid = fork(); assert(pid != -1); if (pid == 0) { for (;;) { pid = fork(); assert(pid != -1); if (pid == 0) { _exit(0); } else { child = wait(NULL); assert(child == pid); } } } pid = fork(); assert(pid != -1); if (pid == 0) { for (;;) { /* ioprio_get(IOPRIO_WHO_PGRP, 0); */ syscall(SYS_ioprio_get, 2, 0); } } } for (;;) { /* ioprio_get(IOPRIO_WHO_PGRP, 0); */ syscall(SYS_ioprio_get, 2, 0); } return 0; } This gets us KASAN dumps like this: [ 35.526914] ================================================================== [ 35.530009] BUG: KASAN: out-of-bounds in get_task_ioprio+0x7b/0x90 at addr ffff880066f34e6c [ 35.530009] Read of size 2 by task ioprio-gpf/363 [ 35.530009] ============================================================================= [ 35.530009] BUG blkdev_ioc (Not tainted): kasan: bad access detected [ 35.530009] ----------------------------------------------------------------------------- [ 35.530009] Disabling lock debugging due to kernel taint [ 35.530009] INFO: Allocated in create_task_io_context+0x2b/0x370 age=0 cpu=0 pid=360 [ 35.530009] ___slab_alloc+0x55d/0x5a0 [ 35.530009] __slab_alloc.isra.20+0x2b/0x40 [ 35.530009] kmem_cache_alloc_node+0x84/0x200 [ 35.530009] create_task_io_context+0x2b/0x370 [ 35.530009] get_task_io_context+0x92/0xb0 [ 35.530009] copy_process.part.8+0x5029/0x5660 [ 35.530009] _do_fork+0x155/0x7e0 [ 35.530009] SyS_clone+0x19/0x20 [ 35.530009] do_syscall_64+0x195/0x3a0 [ 35.530009] return_from_SYSCALL_64+0x0/0x6a [ 35.530009] INFO: Freed in put_io_context+0xe7/0x120 age=0 cpu=0 pid=1060 [ 35.530009] __slab_free+0x27b/0x3d0 [ 35.530009] kmem_cache_free+0x1fb/0x220 [ 35.530009] put_io_context+0xe7/0x120 [ 35.530009] put_io_context_active+0x238/0x380 [ 35.530009] exit_io_context+0x66/0x80 [ 35.530009] do_exit+0x158e/0x2b90 [ 35.530009] do_group_exit+0xe5/0x2b0 [ 35.530009] SyS_exit_group+0x1d/0x20 [ 35.530009] entry_SYSCALL_64_fastpath+0x1a/0xa4 [ 35.530009] INFO: Slab 0xffffea00019bcd00 objects=20 used=4 fp=0xffff880066f34ff0 flags=0x1fffe0000004080 [ 35.530009] INFO: Object 0xffff880066f34e58 @offset=3672 fp=0x0000000000000001 [ 35.530009] ================================================================== Fix it by grabbing the task lock while we poke at the io_context. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> Change-Id: I3f5858cc9a1b9d4124ae7a6578660dec219d2c57 Bug: 30946378