- 22 Jan, 2019 4 commits
-
-
Jean-Philippe Brucker authored
Virtio allows to reset individual virtqueues. For legacy devices, it's done by writing an address of 0 into the PFN register. Modern devices have an "enable" register. Add an exit_vq() callback to all devices. A lot more work is required by each device to clean up their virtqueue state, and by the core to reset things like MSI routes and ioeventfds. Signed-off-by:
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by:
Julien Thierry <julien.thierry@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
Jean-Philippe Brucker authored
To ease future changes to the core, replace get_pfn_vq() with get_vq(). This way adding new generic operation on virtqueues won't require modifying every virtio device. Signed-off-by:
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by:
Julien Thierry <julien.thierry@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
Jean-Philippe Brucker authored
Modern virtio requires devices to report how many queues they support. Add an operation to query all devices about their capacities. Signed-off-by:
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by:
Julien Thierry <julien.thierry@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
Jean-Philippe Brucker authored
Modern virtio require proper status handling and reset. A "notify_status" callback is already present in the virtio ops, but isn't implemented by any device. Instead they currently use "set_guest_feature" to reset the device and deal with endianess. This isn't sufficient for proper device reset, so add the notify_status callback to all devices that need it. To add useful hints like "start" and "stop", extend the status variable to 32-bits. Signed-off-by:
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> [Julien T: Remove VIRTIO_CONFIG_S_NEEDS_RESET from config mask, as it is virtio v1+ macro and kvmtool only implements v0.9, this macro should not be referenced for now] Signed-off-by:
Julien Thierry <julien.thierry@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
- 02 Nov, 2018 1 commit
-
-
Jean-Philippe Brucker authored
After adding buffers to the virtio queue, the guest increments the avail index. It then reads the event index to check if it needs to notify the host. If the event index corresponds to the previous avail value, then the guest notifies the host. Otherwise it means that the host is still processing the queue and hasn't had a chance to increment the event index yet. Once it gets there, the host will see the new avail index and process the descriptors, so there is no need for a notification. This is only guaranteed to work if both threads write and read the indices in the right order. Currently a barrier is missing from virt_queue__available(), and the host may not see an up-to-date value of event index after writing avail. HOST | GUEST | | write avail = 1 | mb() | read event -> 0 write event = 0 | == prev_avail -> notify read avail -> 1 | | write event = 1 | read avail -> 1 | wait() | write avail = 2 | mb() | read event -> 0 | != prev_avail -> no notification By adding a memory barrier on the host side, we ensure that it doesn't miss any notification. Reviewed-By:
Steven Price <steven.price@arm.com> Signed-off-by:
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
- 19 Mar, 2018 1 commit
-
-
Jean-Philippe Brucker authored
One barrier seems to be missing from kvmtool's virtio implementation, between virt_queue__available() and virt_queue__pop(). In the following scenario "avail" represents the shared "available" structure in the virtio queue: Guest | Host | avail.ring[shadow] = desc_idx | while (avail.idx != shadow) smp_wmb() | /* missing smp_rmb() */ avail.idx = ++shadow | desc_idx = avail.ring[shadow++] If the host observes the avail.idx write before the avail.ring update, then it will fetch the wrong desc_idx. Add the missing barrier. This seems to fix the horrible bug I'm often seeing when running netperf in a guest (virtio-net + tap) on AMD Seattle. The TX thread reads the wrong descriptor index and either faults when accessing the TX buffer, or pushes the wrong index to the used ring. In that case the guest complains that "id %u is not a head!" and stops the queue. Signed-off-by:
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
- 29 Jan, 2018 2 commits
-
-
Jean-Philippe Brucker authored
Bad things happen when the VIRTIO_RING_F_EVENT_IDX feature isn't negotiated and we try to write the avail_event anyway. SeaBIOS, for example, stores internal data where avail_event should be [1]. Technically the Virtio specification doesn't forbid the device from writing the avail_event, and it's up to the driver to reserve space for it ("the transitional driver [...] MUST allocate the total number of bytes for the virtqueue according to [formula containing the avail event]"). But it doesn't hurt us to avoid writing avail_event, and kvmtool needs changes for interrupt suppression anyway, in order to comply with the spec. Indeed Virtio 1.0 cs04 says, in 2.4.7.2 Device Requirements: Virtqueue Interrupt Suppression: """ If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: * The device MUST ignore the used_event value. * After the device writes a descriptor index into the used ring: - If flags is 1, the device SHOULD NOT send an interrupt. """ So let's do that. [1] https://patchwork.kernel.org/patch/10038931/ Signed-off-by:
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
Jean-Philippe Brucker authored
We're going to need the features bits negotiated between host and guest in the core code. Save them in the virtio_device structure. Signed-off-by:
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
- 09 Jun, 2017 1 commit
-
-
Andre Przywara authored
Currently we deny any VHOST_* functionality if the architecture supports guests with different endianness than the host. Most of the time even on those architectures the endianness of guest and host are the same, though, so we are denying the glory of VHOST needlessly. Switch from compile time determination to a run time scheme, which takes the actual endianness of the guest into account. For this we change the semantics of VIRTIO_ENDIAN_HOST to return the actual endianness of the host (the endianness of kvmtool at compile time, really). The actual check in vhost_net now compares this against the guest endianness. This enables vhost support on ARM and ARM64. Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
- 17 Feb, 2017 1 commit
-
-
Will Deacon authored
When merging virtio-net buffers using the VIRTIO_NET_F_MRG_RXBUF feature, the first buffer added to the used ring should indicate the total number of buffers used to hold the packet. Unfortunately, kvmtool has a number of issues when constructing these merged buffers: - Commit 5131332e3f1a ("kvmtool: convert net backend to support bi-endianness") introduced a strange loop counter, which resulted in hdr->num_buffers being set redundantly the first time round - When adding the buffers to the ring, we actually add them one-by-one, allowing the guest to see the header before we've inserted the rest of the data buffers... - ... which is made worse because we non-atomically increment the num_buffers count in the header each time we insert a new data buffer Consequently, the guest quickly becomes confused in its net rx code and the whole thing grinds to a halt. This is easily exemplified by trying to boot a root filesystem over NFS, which seldom succeeds. This patch resolves the issues by allowing us to insert items into the used ring without updating the index. Once the full payload has been added and num_buffers corresponds to the total size, we *then* publish the buffers to the guest. Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
- 01 Jun, 2015 26 commits
-
-
Suzuki Poulose authored
lkvm by default sets up a virtio-pci transport for network, if none is specified. This can be a problem on archs (e.g ARM64), where virtio-pci is not supported yet and cause the following warning at exit. # KVM compatibility warning. virtio-net device was not detected. This patch changes it to make use of the default transport method for the architecture when none is specified. This will ensure that on every arch we get the network up by default in the VM. Signed-off-by:
Suzuki K. Poulose <suzuki.poulose@arm.com> Acked-by:
Will Deacon <will.deacon@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
Marc Zyngier authored
Add a utility function that transfers the endianness sampled at device reset time to a queue being set up. Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Marc Zyngier authored
Save the CPU endianness when the device is reset. It is widely assumed that the guest won't change its endianness after, or at least not without reseting the device first. A default implementation of the endianness sampling just returns the default "host endianness" value so that unsuspecting architectures are not affected. Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Marc Zyngier authored
Define a simple infrastructure to configure a virt_queue depending on the guest endianness, as reported by the feature flags. At this stage, the endianness is always the host's. Wrap all accesses to virt_queue data structures shared between host and guest with byte swapping helpers. Should the architecture only support one endianness, these helpers are reduced to the identity function. Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Sasha Levin authored
Commit "kvm tools: virtio: remove hardcoded assumptions about guest page size" has introduced a bug that prevented guests with more than 4gb of ram from booting. The issue is that 'pfn' is a 32bit integer, so when multiplying it by page size to get the actual page will cause an overflow if the pfn referred to a memory area above 4gb. Acked-by:
Will Deacon <will.deacon@arm.com> Signed-off-by:
Sasha Levin <sasha.levin@oracle.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Sasha Levin authored
Some devices want to know their status, use this hook to allow them to get that notification. Signed-off-by:
Sasha Levin <sasha.levin@oracle.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Will Deacon authored
virtio-based PCI devices deal only with 4k memory granules, making direct use of the VIRTIO_PCI_VRING_ALIGN and VIRTIO_PCI_QUEUE_ADDR_SHIFT constants when initialising the virtqueues for a device. For MMIO-based devices, the guest page size is arbitrary and may differ from that of the host (this is the case on AArch64, where both 4k and 64k pages are supported). This patch fixes the virtio drivers to honour the guest page size passed when configuring the virtio device and align the virtqueues accordingly. Signed-off-by:
Will Deacon <will.deacon@arm.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Sasha Levin authored
Instead of a get/set for config values, just request the address of the config region, and handle that by simply reading directly from that region. Signed-off-by:
Sasha Levin <levinsasha928@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Asias He authored
If vhost is enabled for a virtio device, vhost will poll the ioeventfd in kernel side and there is no need to poll it in userspace. Otherwise, both vhost kernel and userspace will race to poll. Signed-off-by:
Asias He <asias.hejun@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Asias He authored
This patch introduces a helper virtio_compat_add_message() to simplify adding compat message for virtio device. Signed-off-by:
Asias He <asias.hejun@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Asias He authored
Signed-off-by:
Asias He <asias.hejun@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Asias He authored
This patch reworks on virtio transport abstraction. * Move virtio transport operations to virtio operations and drop virtio/trans.c This makes the abstraction much cleaner. * Rename struct virtio_trans to struct virtio_device struct virtio_trans { void *virtio; enum virtio_trans_type type; struct virtio_trans_ops *trans_ops; struct virtio_ops *virtio_ops; }; struct virtio_device { void *virtio; struct virtio_ops *ops; }; The virtio_trans struct is bit confusing since it also includes virtio operations. * Introduce virtio_init() To init device, e.g. Before: virtio_trans_init() ndev->vtrans.trans_ops->init() ndev->vtrans.virtio_ops = &net_dev_virtio_ops After: virtio_init() Signed-off-by:
Asias He <asias.hejun@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Sasha Levin authored
Rusty has just removed it out of the spec. Since we probably the only ones who implemented support for it, we should remove it out of our code as well. There is no issue with breaking anything since nothing else worked with it, so it's fully backwards compatible. Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by:
Sasha Levin <levinsasha928@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Asias He authored
There are at most bdev->reqs[VIRTIO_BLK_QUEUE_SIZE] outstanding requests at any time. We can simply use the head of each request to fetch the right 'struct blk_dev_req' in bdev->reqs[]. So, we can eliminate the list and lock operations which introduced by virtio_blk_req_{pop, push}. Signed-off-by:
Asias He <asias.hejun@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Asias He authored
This patch also fixes fio seq-read hang problem. root@guest-kvm:~# cat seq-read.fio [seq-read] rw=read bs=4096 size=512m direct=1 filename=/dev/vdb root@guest-kvm:~# fio seq-read.fio random-read: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1 Starting 1 process Jobs: 1 (f=1): [R] [50.0% done] [0K/0K /s] [0/0 iops] [eta 00m:27s] Acked-by:
Sasha Levin <levinsasha928@gmail.com> Signed-off-by:
Asias He <asias.hejun@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Sasha Levin authored
This patch is the base for enabling support for event index feature in the virtio spec. We do so by updating and evaluating the used/avail event idx in the virtio ring functions. Actual usage of this flag is in the following patches. The results are less notifications between the guest and host, and in result faster operation of the virt queues. Signed-off-by:
Sasha Levin <levinsasha928@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Asias He authored
This function is not used anymore. Instead, We are using virtio_pci__signal_vq() to trigger interrupt right now. Signed-off-by:
Asias He <asias.hejun@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Sasha Levin authored
This patch adds a helper used to retrieve the type of field used when guest is writing or reading from virtio config space. Since the config space is dynamic, it may change during runtime - so we must calculate it before every read/write. Signed-off-by:
Sasha Levin <levinsasha928@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Aneesh Kumar K.V authored
This add the in and out iovec to seperate array Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
-
Sasha Levin authored
Instead of redefining virtio pci constants (or not using them at all), use constants from kernel header. Acked-and-tested-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Sasha Levin <levinsasha928@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Sasha Levin authored
queue->pfn may be used to point at addresses larger than 32 bit. Prevent a wraparound when shifting it left. Acked-and-tested-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Sasha Levin <levinsasha928@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Asias He authored
Inject IRQ to guest only when ISR status is low which means guest has read ISR status and device has cleared this bit as the side effect of this reading. This reduces a lot of unnecessary IRQ inject from device to guest. Netpef test shows this patch changes: the host to guest bandwidth from 2866.27 Mbps (cpu 33.96%) to 5548.87 Mbps (cpu 53.87%), the guest to host bandwitdth form 1408.86 Mbps (cpu 99.9%) to 1301.29 Mbps (cpu 99.9%). The bottleneck of the guest to host bandwidth is guest cpu power. Signed-off-by:
Asias He <asias.hejun@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Sasha Levin authored
Clean uint*_t type from the code. Signed-off-by:
Sasha Levin <levinsasha928@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Amos Kong authored
virtio_console__inject_interrupt tries to use virt queues before guest tell us to initialize them. (gdb) r run -i linux-0.2.img -k ./vmlinuz-2.6.38-rc6+ -r ./initrd.img-2.6.38-rc6+ -p=init=1 -m 500 -c Starting program: /project/rh/kvm-tools/tools/kvm/kvm run -i linux-0.2.img -k ./vmlinuz-2.6.38-rc6+ -r ./initrd.img-2.6.38-rc6+ -p=init=1 -m 500 -c [Thread debugging using libthread_db enabled] [New Thread 0x7fffd6e2d700 (LWP 19280)] Warning: request type 8 Program received signal SIGSEGV, Segmentation fault. 0x00000000004026ca in virt_queue__available (vq=0x60d3c8) at include/kvm/virtio.h:31 31 return vq->vring.avail->idx != vq->last_avail_idx; (gdb) (gdb) bt (gdb) p *vq $2 = {vring = {num = 0, desc = 0x0, avail = 0x0, used = 0x0}, pfn = 0, last_avail_idx = 0} include/kvm/virtio-console.h: 59 void virtio_console__inject_interrupt(struct kvm *self) .... 71 if (term_readable(CONSOLE_VIRTIO) && virt_queue__available(vq)) { 72 head = virt_queue__get_iov(vq, iov, &out, &in, self); ^^^^ then this block will not be executed if virtio_queue is unavaiable. Changes from v1: - move the check of virt_queue out of virt_queue__get_iov() Reported-by:
Amos Kong <akong@redhat.com> Acked-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Asias He <asias.hejun@gmail.com> Signed-off-by:
Amos Kong <akong@redhat.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Asias He authored
Use virt_queue__set_used_elem instead. Signed-off-by:
Asias He <asias.hejun@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Asias He authored
This patch moves common virtio code to virtio.c and virtio.h. Signed-off-by:
Asias He <asias.hejun@gmail.com> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-