Age | Commit message (Collapse) | Author |
|
struct vsock_transport contains function pointers called by AF_VSOCK
core code. The transport may want its own transport-specific function
pointers and they can be added after struct vsock_transport.
Allow the transport to fetch vsock_transport. It can downcast it to
access transport-specific function pointers.
The virtio transport will use this.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 0b01aeb3d2fbf16787f0c9629f4ca52ae792f732)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Cody Schuffelen <schuffelen@google.com>
Change-Id: I442706ae71dc14c70fc4033d9719134c2d034509
|
|
There are several places where the listener and pending or accept queue
child sockets are accessed at the same time. Lockdep is unhappy that
two locks from the same class are held.
Tell lockdep that it is safe and document the lock ordering.
Originally Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> sent a similar
patch asking whether this is safe. I have audited the code and also
covered the vsock_pending_work() function.
Suggested-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4192f672fae559f32d82de72a677701853cc98a7)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Cody Schuffelen <schuffelen@google.com>
Change-Id: I0cb7ee964057e9338971e1a2043ae17557feaec7
|
|
This patch tries to implement an device IOTLB for vhost. This could be
used with userspace(qemu) implementation of DMA remapping
to emulate an IOMMU for the guest.
The idea is simple, cache the translation in a software device IOTLB
(which is implemented as an interval tree) in vhost and use vhost_net
file descriptor for reporting IOTLB miss and IOTLB
update/invalidation. When vhost meets an IOTLB miss, the fault
address, size and access can be read from the file. After userspace
finishes the translation, it writes the translated address to the
vhost_net file to update the device IOTLB.
When device IOTLB is enabled by setting VIRTIO_F_IOMMU_PLATFORM all vq
addresses set by ioctl are treated as iova instead of virtual address and
the accessing can only be done through IOTLB instead of direct userspace
memory access. Before each round or vq processing, all vq metadata is
prefetched in device IOTLB to make sure no translation fault happens
during vq processing.
In most cases, virtqueues are contiguous even in virtual address space.
The IOTLB translation for virtqueue itself may make it a little
slower. We might add fast path cache on top of this patch.
Signed-off-by: Jason Wang <jasowang@redhat.com>
[mst: use virtio feature bit: VHOST_F_DEVICE_IOTLB -> VIRTIO_F_IOMMU_PLATFORM ]
[mst: fix build warnings ]
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[ weiyj.lk: missing unlock on error ]
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
(cherry picked from commit 6b1e6cc7855b09a0a9bfa1d9f30172ba366f161c)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + vsock adb tunnel
Signed-off-by: Cody Schuffelen <schuffelen@google.com>
Change-Id: I10e4e64d6bc9b36a0d9b444c2319e290921c63c6
|
|
Current pre-sorted memory region array has some limitations for future
device IOTLB conversion:
1) need extra work for adding and removing a single region, and it's
expected to be slow because of sorting or memory re-allocation.
2) need extra work of removing a large range which may intersect
several regions with different size.
3) need trick for a replacement policy like LRU
To overcome the above shortcomings, this patch convert it to interval
tree which can easily address the above issue with almost no extra
work.
The patch could be used for:
- Extend the current API and only let the userspace to send diffs of
memory table.
- Simplify Device IOTLB implementation.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit a9709d6874d55130663567577a9b05c35138cc6b)
[astrachan: Backported around stable backport 711df71 ("vhost_net: stop
device during reset owner")]
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Alistair Strachan <astrachan@google.com>
Change-Id: I51c7c7229908a5ce1f082a80eeda5a01e85c2234
|
|
This patch introduces vhost memory accessors which were just wrappers
for userspace address access helpers. This is a requirement for vhost
device iotlb implementation which will add iotlb translations in those
accessors.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit bfe2bc512884d0b1c5297a15350f940ca80e439b)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Alistair Strachan <astrachan@google.com>
Change-Id: Ia67c171384109e646f55027a71dd8df9c6b9c61a
|
|
We don't stop rx polling socket during rx processing, this will lead
unnecessary wakeups from under layer net devices (E.g
sock_def_readable() form tun). Rx will be slowed down in this
way. This patch avoids this by stop polling socket during rx
processing. A small drawback is that this introduces some overheads in
light load case because of the extra start/stop polling, but single
netperf TCP_RR does not notice any change. In a super heavy load case,
e.g using pktgen to inject packet to guest, we get about ~8.8%
improvement on pps:
before: ~1240000 pkt/s
after: ~1350000 pkt/s
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8241a1e466cd56e6c10472cac9c1ad4e54bc65db)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Alistair Strachan <astrachan@google.com>
Change-Id: Ia51c61a6c1976b6f6406342f369ffc365d964078
|
|
The vsock_transport structure is never modified, so declare it as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 56130915bbe31656c80f7493d28536693f8de0e2)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Cody Schuffelen <schuffelen@google.com>
Change-Id: I59ffded185fbf22aaeb39753870ef9c866ab1a2a
|
|
We use spinlock to synchronize the work list now which may cause
unnecessary contentions. So this patch switch to use llist to remove
this contention. Pktgen tests shows about 5% improvement:
Before:
~1300000 pps
After:
~1370000 pps
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 04b96e5528ca97199b429810fe963185a67dd40e)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Alistair Strachan <astrachan@google.com>
Change-Id: Icf032cf2010eaedd92dc5906d454274709b4a9b4
|
|
We used to implement the work flushing through tracking queued seq,
done seq, and the number of flushing. This patch simplify this by just
implement work flushing through another kind of vhost work with
completion. This will be used by lockless enqueuing patch.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 7235acdb1144460d9f520f0d931f3cbb79eb244c)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Alistair Strachan <astrachan@google.com>
Change-Id: Id3903050706dd734a0217c56ad8ca99b2b22471e
|
|
If skb_recv_datagram returns an skb, we should ignore the err
value returned. Otherwise, datagram receives will return EAGAIN
when they have to wait for a datagram.
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9c995cc9a206a008699da82f6cd01e9b2615649a)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Cody Schuffelen <schuffelen@google.com>
Change-Id: I69d487529656cb2fe8be9c2cef0db440d4db5cac
|
|
When a thread is prepared for waiting by calling prepare_to_wait, sleeping
is not allowed until either the wait has taken place or finish_wait has
been called. The existing code in af_vsock imposed unnecessary no-sleep
assumptions to a broad list of backend functions.
This patch shrinks the influence of prepare_to_wait to the area where it
is strictly needed, therefore relaxing the no-sleep restriction there.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f7f9b5e7f8eccfd68ffa7b8d74b07c478bb9e7f0)
[astrachan: Backported around stable backport 3223ea1 ("vsock: use new
wait API for vsock_stream_sendmsg()")]
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Cody Schuffelen <schuffelen@google.com>
Change-Id: Ic1d7bae4b1187ad48194bf4d0b1dd09ab0275734
|
|
This patch tries to poll for new added tx buffer or socket receive
queue for a while at the end of tx/rx processing. The maximum time
spent on polling were specified through a new kind of vring ioctl.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 0308813724606549436d30efd877a80c8e00790e)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Alistair Strachan <astrachan@google.com>
Change-Id: I93db6c14e39b2db04e047ee2e0dce3af5436d95e
|
|
This patch introduces a helper which will return true if we're sure
that the available ring is empty for a specific vq. When we're not
sure, e.g vq access failure, return false instead. This could be used
for busy polling code to exit the busy loop.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit d4a60603fa0b42012decfa058dfa44cffde7a10c)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Alistair Strachan <astrachan@google.com>
Change-Id: I1d4e972543470530a5bc37e19f4199b1f345e042
|
|
This path introduces a helper which can give a hint for whether or not
there's a work queued in the work list. This could be used for busy
polling code to exit the busy loop.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 526d3e7ff515582eaf7cf5d9da2a11bf681204e6)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Alistair Strachan <astrachan@google.com>
Change-Id: I2717a212f888b0d3ea78f1ce0bcc21ec9dee5050
|
|
Looking at how callers use this, maybe we should just rename init_used
to vhost_vq_init_access. The _used suffix was a hint that we
access the vq used ring. But maybe what callers care about is
that it must be called after access_ok.
Also, this function manipulates the vq->is_le field which isn't related
to the vq used ring.
This patch simply renames vhost_init_used() to vhost_vq_init_access() as
suggested by Michael.
No behaviour change.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 80f7d0301e7913f704d3505722f806717c61dff5)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Alistair Strachan <astrachan@google.com>
Change-Id: If8ef33d7f00e515afbff3adc7b00607c4a58fdf6
|
|
The default use case for vhost is when the host and the vring have the
same endianness (default native endianness). But there are cases where
they differ and vhost should byteswap when accessing the vring.
The first case is when the host is big endian and the vring belongs to
a virtio 1.0 device, which is always little endian.
This is covered by the vq->is_le field. This field is initialized when
userspace calls the VHOST_SET_FEATURES ioctl. It is reset when the device
stops.
We already have a vhost_init_is_le() helper, but the reset operation is
opencoded as follows:
vq->is_le = virtio_legacy_is_little_endian();
It isn't clear that we are resetting vq->is_le here.
This patch moves the code to a helper with a more explicit name.
The other case where we may have to byteswap is when the architecture can
switch endianness at runtime (bi-endian). If endianness differs in the host
and in the guest, then legacy devices need to be used in cross-endian mode.
This mode is available with CONFIG_VHOST_CROSS_ENDIAN_LEGACY=y, which
introduces a vq->user_be field. Userspace may enable cross-endian mode
by calling the SET_VRING_ENDIAN ioctl before the device is started. The
cross-endian mode is disabled when the device is stopped.
The current names of the helpers that manipulate vq->user_be are unclear.
This patch renames those helpers to clearly show that this is cross-endian
stuff and with explicit enable/disable semantics.
No behaviour change.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit c507203756ca6303df2191c96c1385e965e2f0b7)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Alistair Strachan <astrachan@google.com>
Change-Id: I42271a7cbc4ecc3826116943805bc59d4f8bd192
|
|
We don't want side effects. If something fails, we rollback vq->is_le to
its previous value.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit e1f33be9186363da7955bcb5f0b03e6685544c50)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Alistair Strachan <astrachan@google.com>
Change-Id: I18c57e5c78aa3e89d267213770f915a5a5c76100
|
|
checkpatch.pl wants arrays of strings declared as follows:
static const char * const names[] = { "vq-1", "vq-2", "vq-3" };
Currently the find_vqs() function takes a const char *names[] argument
so passing checkpatch.pl's const char * const names[] results in a
compiler error due to losing the second const.
This patch adjusts the find_vqs() prototype and updates all virtio
transports. This makes it possible for virtio_balloon.c, virtio_input.c,
virtgpu_kms.c, and virtio_rpmsg_bus.c to use the checkpatch.pl-friendly
type.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
(cherry picked from commit f7ad26ff952b3ca2702d7da03aad0ab1f6c01d7c)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Cody Schuffelen <schuffelen@google.com>
Change-Id: I23513ea85e7a43efd0c604fc4445b301b4f610ba
|
|
We do not often add/delete a napi context.
Moving napi_hash[] into read mostly section avoids potential false sharing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6180d9de61a5c461f9e3efef5417a844701dbbb2)
Bug: 121166534
Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS
Signed-off-by: Alistair Strachan <astrachan@google.com>
Change-Id: I891b9dfef88c44f6480842754cd7d81e7a250792
|
|
This option is reverted in change Iebcb0cd9982f36c4bd2552811f9147325a291db0.
Bug: 71728490
Change-Id: If25dfd56abed16bfe579840e4a52d6bedbae69ca
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
|
|
This reverts commit ace74ccf82cfb2b73ce1df2e698d20c2fbc559dd.
Bug: 71728490
Change-Id: Iebcb0cd9982f36c4bd2552811f9147325a291db0
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
|
|
It's only in DIO before, complement for BIO.
Bug: 120445624
Change-Id: I90b6fb15e355978da8805ed6306c595819be989d
Signed-off-by: Peng Zhou <peng.zhou@mediatek.com>
|
|
Changes in 4.4.170
USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data
xhci: Don't prevent USB2 bus suspend in state check intended for USB3 only
USB: serial: option: add GosunCn ZTE WeLink ME3630
USB: serial: option: add HP lt4132
USB: serial: option: add Simcom SIM7500/SIM7600 (MBIM mode)
USB: serial: option: add Fibocom NL668 series
USB: serial: option: add Telit LN940 series
mmc: core: Reset HPI enabled state during re-init and in case of errors
mmc: omap_hsmmc: fix DMA API warning
gpio: max7301: fix driver for use with CONFIG_VMAP_STACK
Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels
x86/mtrr: Don't copy uninitialized gentry fields back to userspace
drm/ioctl: Fix Spectre v1 vulnerabilities
ip6mr: Fix potential Spectre v1 vulnerability
ipv4: Fix potential Spectre v1 vulnerability
ax25: fix a use-after-free in ax25_fillin_cb()
ibmveth: fix DMA unmap error in ibmveth_xmit_start error path
ieee802154: lowpan_header_create check must check daddr
ipv6: explicitly initialize udp6_addr in udp_sock_create6()
isdn: fix kernel-infoleak in capi_unlocked_ioctl
netrom: fix locking in nr_find_socket()
packet: validate address length
packet: validate address length if non-zero
sctp: initialize sin6_flowinfo for ipv6 addrs in sctp_inet6addr_event
vhost: make sure used idx is seen before log in vhost_add_used_n()
VSOCK: Send reset control packet when socket is partially bound
xen/netfront: tolerate frags with no data
gro_cell: add napi_disable in gro_cells_destroy
sock: Make sock->sk_stamp thread-safe
ALSA: rme9652: Fix potential Spectre v1 vulnerability
ALSA: emu10k1: Fix potential Spectre v1 vulnerabilities
ALSA: pcm: Fix potential Spectre v1 vulnerability
ALSA: emux: Fix potential Spectre v1 vulnerabilities
ALSA: hda: add mute LED support for HP EliteBook 840 G4
ALSA: hda/tegra: clear pending irq handlers
USB: serial: pl2303: add ids for Hewlett-Packard HP POS pole displays
USB: serial: option: add Fibocom NL678 series
usb: r8a66597: Fix a possible concurrency use-after-free bug in r8a66597_endpoint_disable()
Input: elan_i2c - add ACPI ID for touchpad in ASUS Aspire F5-573G
KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
perf pmu: Suppress potential format-truncation warning
ext4: fix possible use after free in ext4_quota_enable
ext4: missing unlock/put_page() in ext4_try_to_write_inline_data()
ext4: fix EXT4_IOC_GROUP_ADD ioctl
ext4: force inode writes when nfsd calls commit_metadata()
spi: bcm2835: Fix race on DMA termination
spi: bcm2835: Fix book-keeping of DMA termination
spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode
cdc-acm: fix abnormal DATA RX issue for Mediatek Preloader.
media: vivid: free bitmap_cap when updating std/timings/etc.
MIPS: Ensure pmd_present() returns false after pmd_mknotpresent()
MIPS: Align kernel load address to 64KB
CIFS: Fix error mapping for SMB2_LOCK command which caused OFD lock problem
x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
spi: bcm2835: Unbreak the build of esoteric configs
powerpc: Fix COFF zImage booting on old powermacs
ARM: imx: update the cpu power up timing setting on i.mx6sx
Input: restore EV_ABS ABS_RESERVED
checkstack.pl: fix for aarch64
xfrm: Fix bucket count reported to userspace
scsi: bnx2fc: Fix NULL dereference in error handling
Input: omap-keypad - fix idle configuration to not block SoC idle states
scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown
fork: record start_time late
hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined
mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL
mm, devm_memremap_pages: kill mapping "System RAM" support
sunrpc: fix cache_head leak due to queued request
sunrpc: use SVC_NET() in svcauth_gss_* functions
crypto: x86/chacha20 - avoid sleeping with preemption disabled
ALSA: cs46xx: Potential NULL dereference in probe
ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit()
ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks
dlm: fixed memory leaks after failed ls_remove_names allocation
dlm: possible memory leak on error path in create_lkb()
dlm: lost put_lkb on error path in receive_convert() and receive_unlock()
dlm: memory leaks on error path in dlm_user_request()
gfs2: Fix loop in gfs2_rbm_find
b43: Fix error in cordic routine
9p/net: put a lower bound on msize
iommu/vt-d: Handle domain agaw being less than iommu agaw
ceph: don't update importing cap's mseq when handing cap export
genwqe: Fix size check
intel_th: msu: Fix an off-by-one in attribute store
power: supply: olpc_battery: correct the temperature units
Linux 4.4.170
Change-Id: I1b2927583f8853bfeb3ad11d045c2cf5c5c926f3
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
|
|
|
commit ed54ffbe554f0902689fd6d1712bbacbacd11376 upstream.
According to [1] and [2], the temperature values are in tenths of degree
Celsius. Exposing the Celsius value makes the battery appear on fire:
$ upower -i /org/freedesktop/UPower/devices/battery_olpc_battery
...
temperature: 236.9 degrees C
Tested on OLPC XO-1 and OLPC XO-1.75 laptops.
[1] include/linux/power_supply.h
[2] Documentation/power/power_supply_class.txt
Fixes: fb972873a767 ("[BATTERY] One Laptop Per Child power/battery driver")
Cc: stable@vger.kernel.org
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ec5b5ad6e272d8d6b92d1007f79574919862a2d2 upstream.
The 'nr_pages' attribute of the 'msc' subdevices parses a comma-separated
list of window sizes, passed from userspace. However, there is a bug in
the string parsing logic wherein it doesn't exclude the comma character
from the range of characters as it consumes them. This leads to an
out-of-bounds access given a sufficiently long list. For example:
> # echo 8,8,8,8 > /sys/bus/intel_th/devices/0-msc0/nr_pages
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in memchr+0x1e/0x40
> Read of size 1 at addr ffff8803ffcebcd1 by task sh/825
>
> CPU: 3 PID: 825 Comm: npktest.sh Tainted: G W 4.20.0-rc1+
> Call Trace:
> dump_stack+0x7c/0xc0
> print_address_description+0x6c/0x23c
> ? memchr+0x1e/0x40
> kasan_report.cold.5+0x241/0x308
> memchr+0x1e/0x40
> nr_pages_store+0x203/0xd00 [intel_th_msu]
Fix this by accounting for the comma character.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Fixes: ba82664c134ef ("intel_th: Add Memory Storage Unit driver")
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fdd669684655c07dacbdb0d753fd13833de69a33 upstream.
Calling the test program genwqe_cksum with the default buffer size of
2MB triggers the following kernel warning on s390:
WARNING: CPU: 30 PID: 9311 at mm/page_alloc.c:3189 __alloc_pages_nodemask+0x45c/0xbe0
CPU: 30 PID: 9311 Comm: genwqe_cksum Kdump: loaded Not tainted 3.10.0-957.el7.s390x #1
task: 00000005e5d13980 ti: 00000005e7c6c000 task.ti: 00000005e7c6c000
Krnl PSW : 0704c00180000000 00000000002780ac (__alloc_pages_nodemask+0x45c/0xbe0)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 00000000002932b8 0000000000b73d7c 0000000000000010 0000000000000009
0000000000000041 00000005e7c6f9b8 0000000000000001 00000000000080d0
0000000000000000 0000000000b70500 0000000000000001 0000000000000000
0000000000b70528 00000000007682c0 0000000000277df2 00000005e7c6f9a0
Krnl Code: 000000000027809e: de7195001000 ed 1280(114,%r9),0(%r1)
00000000002780a4: a774fead brc 7,277dfe
#00000000002780a8: a7f40001 brc 15,2780aa
>00000000002780ac: 92011000 mvi 0(%r1),1
00000000002780b0: a7f4fea7 brc 15,277dfe
00000000002780b4: 9101c6b6 tm 1718(%r12),1
00000000002780b8: a784ff3a brc 8,277f2c
00000000002780bc: a7f4fe2e brc 15,277d18
Call Trace:
([<0000000000277df2>] __alloc_pages_nodemask+0x1a2/0xbe0)
[<000000000013afae>] s390_dma_alloc+0xfe/0x310
[<000003ff8065f362>] __genwqe_alloc_consistent+0xfa/0x148 [genwqe_card]
[<000003ff80658f7a>] genwqe_mmap+0xca/0x248 [genwqe_card]
[<00000000002b2712>] mmap_region+0x4e2/0x778
[<00000000002b2c54>] do_mmap+0x2ac/0x3e0
[<0000000000292d7e>] vm_mmap_pgoff+0xd6/0x118
[<00000000002b081c>] SyS_mmap_pgoff+0xdc/0x268
[<00000000002b0a34>] SyS_old_mmap+0x8c/0xb0
[<000000000074e518>] sysc_tracego+0x14/0x1e
[<000003ffacf87dc6>] 0x3ffacf87dc6
turns out the check in __genwqe_alloc_consistent uses "> MAX_ORDER"
while the mm code uses ">= MAX_ORDER". Fix genwqe.
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Frank Haverkamp <haver@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3c1392d4c49962a31874af14ae9ff289cb2b3851 upstream.
Updating mseq makes client think importer mds has accepted all prior
cap messages and importer mds knows what caps client wants. Actually
some cap messages may have been dropped because of mseq mismatch.
If mseq is left untouched, importing cap's mds_wanted later will get
reset by cap import message.
Cc: stable@vger.kernel.org
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3569dd07aaad71920c5ea4da2d5cc9a167c1ffd4 upstream.
The Intel IOMMU driver opportunistically skips a few top level page
tables from the domain paging directory while programming the IOMMU
context entry. However there is an implicit assumption in the code that
domain's adjusted guest address width (agaw) would always be greater
than IOMMU's agaw.
The IOMMU capabilities in an upcoming platform cause the domain's agaw
to be lower than IOMMU's agaw. The issue is seen when the IOMMU supports
both 4-level and 5-level paging. The domain builds a 4-level page table
based on agaw of 2. However the IOMMU's agaw is set as 3 (5-level). In
this case the code incorrectly tries to skip page page table levels.
This causes the IOMMU driver to avoid programming the context entry. The
fix handles this case and programs the context entry accordingly.
Fixes: de24e55395698 ("iommu/vt-d: Simplify domain_context_mapping_one")
Cc: <stable@vger.kernel.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reported-by: Ramos Falcon, Ernesto R <ernesto.r.ramos.falcon@intel.com>
Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 574d356b7a02c7e1b01a1d9cba8a26b3c2888f45 upstream.
If the requested msize is too small (either from command line argument
or from the server version reply), we won't get any work done.
If it's *really* too small, nothing will work, and this got caught by
syzbot recently (on a new kmem_cache_create_usercopy() call)
Just set a minimum msize to 4k in both code paths, until someone
complains they have a use-case for a smaller msize.
We need to check in both mount option and server reply individually
because the msize for the first version request would be unchecked
with just a global check on clnt->msize.
Link: http://lkml.kernel.org/r/1541407968-31350-1-git-send-email-asmadeus@codewreck.org
Reported-by: syzbot+0c1d61e4db7db94102ca@syzkaller.appspotmail.com
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8ea3819c0bbef57a51d8abe579e211033e861677 upstream.
The cordic routine for calculating sines and cosines that was added in
commit 6f98e62a9f1b ("b43: update cordic code to match current specs")
contains an error whereby a quantity declared u32 can in fact go negative.
This problem was detected by Priit Laes who is switching b43 to use the
routine in the library functions of the kernel.
Fixes: 986504540306 ("b43: make cordic common (LP-PHY and N-PHY need it)")
Reported-by: Priit Laes <plaes@plaes.org>
Cc: Rafał Miłecki <zajec5@gmail.com>
Cc: Stable <stable@vger.kernel.org> # 2.6.34
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Priit Laes <plaes@plaes.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2d29f6b96d8f80322ed2dd895bca590491c38d34 upstream.
Fix the resource group wrap-around logic in gfs2_rbm_find that commit
e579ed4f44 broke. The bug can lead to unnecessary repeated scanning of the
same bitmaps; there is a risk that future changes will turn this into an
endless loop.
Fixes: e579ed4f44 ("GFS2: Introduce rbm field bii")
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d47b41aceeadc6b58abc9c7c6485bef7cfb75636 upstream.
According to comment in dlm_user_request() ua should be freed
in dlm_free_lkb() after successful attach to lkb.
However ua is attached to lkb not in set_lock_args() but later,
inside request_lock().
Fixes 597d0cae0f99 ("[DLM] dlm: user locks")
Cc: stable@kernel.org # 2.6.19
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c0174726c3976e67da8649ac62cae43220ae173a upstream.
Fixes 6d40c4a708e0 ("dlm: improve error and debug messages")
Cc: stable@kernel.org # 3.5
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 23851e978f31eda8b2d01bd410d3026659ca06c7 upstream.
Fixes 3d6aa675fff9 ("dlm: keep lkbs in idr")
Cc: stable@kernel.org # 3.1
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b982896cdb6e6a6b89d86dfb39df489d9df51e14 upstream.
If allocation fails on last elements of array need to free already
allocated elements.
v2: just move existing out_rsbtbl label to right place
Fixes 789924ba635f ("dlm: fix race between remove and lookup")
Cc: stable@kernel.org # 3.6
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cbb2ebf70daf7f7d97d3811a2ff8e39655b8c184 upstream.
In `create_composite_quirk`, the terminating condition of for loops is
`quirk->ifnum < 0`. So any composite quirks should end with `struct
snd_usb_audio_quirk` object with ifnum < 0.
for (quirk = quirk_comp->data; quirk->ifnum >= 0; ++quirk) {
.....
}
the data field of Bower's & Wilkins PX headphones usb device device quirks
do not end with {.ifnum = -1}, wihch may result in out-of-bound read.
This Patch fix the bug by adding an ending quirk object.
Fixes: 240a8af929c7 ("ALSA: usb-audio: Add a quirck for B&W PX headphones")
Signed-off-by: Hui Peng <benquike@163.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f4351a199cc120ff9d59e06d02e8657d08e6cc46 upstream.
The parser for the processing unit reads bNrInPins field before the
bLength sanity check, which may lead to an out-of-bound access when a
malformed descriptor is given. Fix it by assignment after the bLength
check.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1524f4e47f90b27a3ac84efbdd94c63172246a6f upstream.
The "chip->dsp_spos_instance" can be NULL on some of the ealier error
paths in snd_cs46xx_create().
Reported-by: "Yavuz, Tuba" <tuba@ece.ufl.edu>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In chacha20-simd, clear the MAY_SLEEP flag in the blkcipher_desc to
prevent sleeping with preemption disabled, under kernel_fpu_begin().
This was fixed upstream incidentally by a large refactoring,
commit 9ae433bc79f9 ("crypto: chacha20 - convert generic and x86
versions to skcipher"). But syzkaller easily trips over this when
running on older kernels, as it's easily reachable via AF_ALG.
Therefore, this patch makes the minimal fix for older kernels.
Fixes: c9320b6dcb89 ("crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64")
Cc: linux-crypto@vger.kernel.org
Cc: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b8be5674fa9a6f3677865ea93f7803c4212f3e10 upstream.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4ecd55ea074217473f94cfee21bb72864d39f8d7 upstream.
After commit d202cce8963d, an expired cache_head can be removed from the
cache_detail's hash.
However, the expired cache_head may be waiting for a reply from a
previously submitted request. Such a cache_head has an increased
refcounter and therefore it won't be freed after cache_put(freeme).
Because the cache_head was removed from the hash it cannot be found
during cache_clean() and can be leaked forever, together with stalled
cache_request and other taken resources.
In our case we noticed it because an entry in the export cache was
holding a reference on a filesystem.
Fixes d202cce8963d ("sunrpc: never return expired entries in sunrpc_cache_lookup")
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: stable@kernel.org # 2.6.35
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 06489cfbd915ff36c8e36df27f1c2dc60f97ca56 upstream.
Given the fact that devm_memremap_pages() requires a percpu_ref that is
torn down by devm_memremap_pages_release() the current support for mapping
RAM is broken.
Support for remapping "System RAM" has been broken since the beginning and
there is no existing user of this this code path, so just kill the support
and make it an explicit error.
This cleanup also simplifies a follow-on patch to fix the error path when
setting a devm release action for devm_memremap_pages_release() fails.
Link: http://lkml.kernel.org/r/154275557997.76910.14689813630968180480.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: "Jérôme Glisse" <jglisse@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 808153e1187fa77ac7d7dad261ff476888dcf398 upstream.
devm_memremap_pages() is a facility that can create struct page entries
for any arbitrary range and give drivers the ability to subvert core
aspects of page management.
Specifically the facility is tightly integrated with the kernel's memory
hotplug functionality. It injects an altmap argument deep into the
architecture specific vmemmap implementation to allow allocating from
specific reserved pages, and it has Linux specific assumptions about page
structure reference counting relative to get_user_pages() and
get_user_pages_fast(). It was an oversight and a mistake that this was
not marked EXPORT_SYMBOL_GPL from the outset.
Again, devm_memremap_pagex() exposes and relies upon core kernel internal
assumptions and will continue to evolve along with 'struct page', memory
hotplug, and support for new memory types / topologies. Only an in-kernel
GPL-only driver is expected to keep up with this ongoing evolution. This
interface, and functionality derived from this interface, is not suitable
for kernel-external drivers.
Link: http://lkml.kernel.org/r/154275557457.76910.16923571232582744134.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b15c87263a69272423771118c653e9a1d0672caa upstream.
We have received a bug report that an injected MCE about faulty memory
prevents memory offline to succeed on 4.4 base kernel. The underlying
reason was that the HWPoison page has an elevated reference count and the
migration keeps failing. There are two problems with that. First of all
it is dubious to migrate the poisoned page because we know that accessing
that memory is possible to fail. Secondly it doesn't make any sense to
migrate a potentially broken content and preserve the memory corruption
over to a new location.
Oscar has found out that 4.4 and the current upstream kernels behave
slightly differently with his simply testcase
===
int main(void)
{
int ret;
int i;
int fd;
char *array = malloc(4096);
char *array_locked = malloc(4096);
fd = open("/tmp/data", O_RDONLY);
read(fd, array, 4095);
for (i = 0; i < 4096; i++)
array_locked[i] = 'd';
ret = mlock((void *)PAGE_ALIGN((unsigned long)array_locked), sizeof(array_locked));
if (ret)
perror("mlock");
sleep (20);
ret = madvise((void *)PAGE_ALIGN((unsigned long)array_locked), 4096, MADV_HWPOISON);
if (ret)
perror("madvise");
for (i = 0; i < 4096; i++)
array_locked[i] = 'd';
return 0;
}
===
+ offline this memory.
In 4.4 kernels he saw the hwpoisoned page to be returned back to the LRU
list
kernel: [<ffffffff81019ac9>] dump_trace+0x59/0x340
kernel: [<ffffffff81019e9a>] show_stack_log_lvl+0xea/0x170
kernel: [<ffffffff8101ac71>] show_stack+0x21/0x40
kernel: [<ffffffff8132bb90>] dump_stack+0x5c/0x7c
kernel: [<ffffffff810815a1>] warn_slowpath_common+0x81/0xb0
kernel: [<ffffffff811a275c>] __pagevec_lru_add_fn+0x14c/0x160
kernel: [<ffffffff811a2eed>] pagevec_lru_move_fn+0xad/0x100
kernel: [<ffffffff811a334c>] __lru_cache_add+0x6c/0xb0
kernel: [<ffffffff81195236>] add_to_page_cache_lru+0x46/0x70
kernel: [<ffffffffa02b4373>] extent_readpages+0xc3/0x1a0 [btrfs]
kernel: [<ffffffff811a16d7>] __do_page_cache_readahead+0x177/0x200
kernel: [<ffffffff811a18c8>] ondemand_readahead+0x168/0x2a0
kernel: [<ffffffff8119673f>] generic_file_read_iter+0x41f/0x660
kernel: [<ffffffff8120e50d>] __vfs_read+0xcd/0x140
kernel: [<ffffffff8120e9ea>] vfs_read+0x7a/0x120
kernel: [<ffffffff8121404b>] kernel_read+0x3b/0x50
kernel: [<ffffffff81215c80>] do_execveat_common.isra.29+0x490/0x6f0
kernel: [<ffffffff81215f08>] do_execve+0x28/0x30
kernel: [<ffffffff81095ddb>] call_usermodehelper_exec_async+0xfb/0x130
kernel: [<ffffffff8161c045>] ret_from_fork+0x55/0x80
And that latter confuses the hotremove path because an LRU page is
attempted to be migrated and that fails due to an elevated reference
count. It is quite possible that the reuse of the HWPoisoned page is some
kind of fixed race condition but I am not really sure about that.
With the upstream kernel the failure is slightly different. The page
doesn't seem to have LRU bit set but isolate_movable_page simply fails and
do_migrate_range simply puts all the isolated pages back to LRU and
therefore no progress is made and scan_movable_pages finds same set of
pages over and over again.
Fix both cases by explicitly checking HWPoisoned pages before we even try
to get reference on the page, try to unmap it if it is still mapped. As
explained by Naoya:
: Hwpoison code never unmapped those for no big reason because
: Ksm pages never dominate memory, so we simply didn't have strong
: motivation to save the pages.
Also put WARN_ON(PageLRU) in case there is a race and we can hit LRU
HWPoison pages which shouldn't happen but I couldn't convince myself about
that. Naoya has noted the following:
: Theoretically no such gurantee, because try_to_unmap() doesn't have a
: guarantee of success and then memory_failure() returns immediately
: when hwpoison_user_mappings fails.
: Or the following code (comes after hwpoison_user_mappings block) also impli=
: es
: that the target page can still have PageLRU flag.
:
: /*
: * Torn down by someone else?
: */
: if (PageLRU(p) && !PageSwapCache(p) && p->mapping =3D=3D NULL) {
: action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED);
: res =3D -EBUSY;
: goto out;
: }
:
: So I think it's OK to keep "if (WARN_ON(PageLRU(page)))" block in
: current version of your patch.
Link: http://lkml.kernel.org/r/20181206120135.14079-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.com>
Debugged-by: Oscar Salvador <osalvador@suse.com>
Tested-by: Oscar Salvador <osalvador@suse.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7b55851367136b1efd84d98fea81ba57a98304cf upstream.
This changes the fork(2) syscall to record the process start_time after
initializing the basic task structure but still before making the new
process visible to user-space.
Technically, we could record the start_time anytime during fork(2). But
this might lead to scenarios where a start_time is recorded long before
a process becomes visible to user-space. For instance, with
userfaultfd(2) and TLS, user-space can delay the execution of fork(2)
for an indefinite amount of time (and will, if this causes network
access, or similar).
By recording the start_time late, it much closer reflects the point in
time where the process becomes live and can be observed by other
processes.
Lastly, this makes it much harder for user-space to predict and control
the start_time they get assigned. Previously, user-space could fork a
process and stall it in copy_thread_tls() before its pid is allocated,
but after its start_time is recorded. This can be misused to later-on
cycle through PIDs and resume the stalled fork(2) yielding a process
that has the same pid and start_time as a process that existed before.
This can be used to circumvent security systems that identify processes
by their pid+start_time combination.
Even though user-space was always aware that start_time recording is
flaky (but several projects are known to still rely on start_time-based
identification), changing the start_time to be recorded late will help
mitigate existing attacks and make it much harder for user-space to
control the start_time a process gets assigned.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 60a161b7e5b2a252ff0d4c622266a7d8da1120ce upstream.
Suppose adapter (open) recovery is between opened QDIO queues and before
(the end of) initial posting of status read buffers (SRBs). This time
window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING
causing by design looping with exponential increase sleeps in the function
performing exchange config data during recovery
[zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up.
Suppose an event occurs for which the FCP channel would send an unsolicited
notification to zfcp by means of a previously posted SRB. We saw it with
local cable pull (link down) in multi-initiator zoning with multiple
NPIV-enabled subchannels of the same shared FCP channel.
As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial
status read buffers from within the adapter's ERP thread, the channel does
send an unsolicited notification.
Since v2.6.27 commit d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted
status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules
adapter->stat_work to re-fill the just consumed SRB from a work item.
Now the ERP thread and the work item post SRBs in parallel. Both contexts
call the helper function zfcp_status_read_refill(). The tracking of
missing (to be posted / re-filled) SRBs is not thread-safe due to separate
atomic_read() and atomic_dec(), in order to depend on posting
success. Hence, both contexts can see
atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts
one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue
(trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in
zfcp_qdio_handler_error().
An obvious and seemingly clean fix would be to schedule stat_work from the
ERP thread and wait for it to finish. This would serialize all SRB
re-fills. However, we already have another work item wait on the ERP
thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls
zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the
open port recoveries during zfcp auto port scan, but in fact it waits for
any pending recovery including an adapter recovery. This approach leads to
a deadlock. [see also v3.19 commit 18f87a67e6d6 ("zfcp: auto port scan
resiliency"); v2.6.37 commit d3e1088d6873
("[SCSI] zfcp: No ERP escalation on gpn_ft eval");
v2.6.28 commit fca55b6fb587
("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP")
fixing v2.6.27 commit c57a39a45a76
("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port");
v2.6.27 commit cc8c282963bd
("[SCSI] zfcp: Automatically attach remote ports")]
Instead make the accounting of missing SRBs atomic for parallel execution
in both the ERP thread and adapter->stat_work.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall")
Cc: <stable@vger.kernel.org> #2.6.27+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e2ca26ec4f01486661b55b03597c13e2b9c18b73 ]
With PM enabled, I noticed that pressing a key on the droid4 keyboard will
block deeper idle states for the SoC. Let's fix this by using IRQF_ONESHOT
and stop constantly toggling the device OMAP4_KBD_IRQENABLE register as
suggested by Dmitry Torokhov <dmitry.torokhov@gmail.com>.
From the hardware point of view, looks like we need to manage the registers
for OMAP4_KBD_IRQENABLE and OMAP4_KBD_WAKEUPENABLE together to avoid
blocking deeper SoC idle states. And with toggling of OMAP4_KBD_IRQENABLE
register now gone with IRQF_ONESHOT, also the SoC idle state problem is
gone during runtime. We still also need to clear OMAP4_KBD_WAKEUPENABLE in
omap4_keypad_close() though to pair it with omap4_keypad_open() to prevent
blocking deeper SoC idle states after rmmod omap4-keypad.
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9ae4f8420ed7be4b13c96600e3568c144d101a23 ]
If "interface" is NULL then we can't release it and trying to will only
lead to an Oops.
Fixes: aea71a024914 ("[SCSI] bnx2fc: Introduce interface structure for each vlan interface")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ca92e173ab34a4f7fc4128bd372bd96f1af6f507 ]
sadhcnt is reported by `ip -s xfrm state count` as "buckets count", not the
hash mask.
Fixes: 28d8909bc790 ("[XFRM]: Export SAD info.")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|