staging: Add ext-zones protocol for area-limited window positioning
Hello everyone!
This is a new attempt to resolve the issues clients designed for stacking window managers are facing when they want to set their own window positions in a specific order. Please check out !247 (closed) for a rationale with examples, and #72 for the (old) original feature request.
Prior approaches
The initial protocol proposal which used absolute monitor-based coordinates was NACK'ed by Weston (and GNOME later), so it is in the ext namespace now (see !247 (closed)). Before pursuing that protocol for inclusion in staging under the ext namespace, I would really like to find a more universally applicable solution for this problem though - maybe we can even improve the design instead of doing what X11 did.
So, a different attempt was born, using relative positioning, where windows position themselves in relation to other windows: !249 (closed) - This got a lot of good feedback that I had to think about, with a bunch of corner case issues as well as more fundamental problems raised. At the same time I also got exposed to a few more styles of application that people would like to have supported, and ultimately after trying to implement a bit of that protocol, I came to the conclusion that it would just make absolutely everybody unhappy with a compromise-solution:
- Application porters from other OSes and X11 would be unhappy as there would be no way to really map existing behavior to the Wayland world
- App developers would be unhappy as it limits their application's design to what paradigms the compositor supports with regards to window placement (e.g. "top", "left" anchor semantics - what if the app wanted to place two windows centered at the top?)
- Compositor developers would be unhappy as they would need to implement a lot of style logic and alignment paradigms in their compositor, while still having no idea what the application was actually trying to achieve.
The protocol proved to be pretty good at implementing a GIMP-ish clone, but fell apart as soon as I tried to do anything more complex. I don't think the idea is dead overall, placing windows in relation to each other is probably still useful, but I think this new protocol may be a better solution overall.
Introducing "zones"
This explanation has been changed to more closely match the current proposal and give a better introduction, as it has been heavily altered during review (and will for sure continue to be edited, please read the patch to be up to date!
The new protocol in this MR introduces the concept of a "zone", a new per-client coordinate system, provided by the compositor and attached to one output, in which it can place its windows. The client can only know window positions relative to the zone the compositor has assigned to it, and a zone can be a defined rectangle with fixed dimensions, or an infinite space without any limits.
A zone can be reshaped by the compositor at any time, but must always include all windows of the client that were assigned to it. Every client may only have one zone, and can share the zone with trusted other processes by sharing its handle with them. That way, any external process that is contributing a window can create its window in relation to the other windows by sharing the same coordinate system (as long as the clients trust each other to exchange tokens).
A zone is a per-client entity, and clients must not assume it reflects any real object like the monitor geometry.
[ I am not sold on the "zone" name - it's better than the previous name, "workspace", as that term was overloaded with meaning already, but if you have a better idea than "zone", please let me know! "Window group" was another option, but that also already has different specific connotations. ]
So, here is how this could look like with different configurations:
Simplest Case: In this case, where there is only a single monitor and a stacking window manager, the client would just receive a zone rectangle that encompasses the usable screen area, so any space that is not restricted for usage by windows due to shell elements being in the way. The green rectangle is the zone, with the yellow star being its coordinate origin (0, 0).
Simple Multi-Monitor Case: In this case we have two monitors, which may have different resolutions. Any zone is always attached to one output, so in order for the client to control window placement, it must create a second zone on the second monitor and handle the transition of a window between monitors (see below). A compositor can prevent zones from being created on specific output, in which case the client could not move windows to the respective output (and windows can only be moved there manually by the user or by the compositor).
Multiple Client Zones Case: This is an interesting new idea where the compositor could determine that the current desktop is already too cluttered and carve out a smaller box of it for the new application to place its windows in. The application will then try to fill that designated space. If there are multiple multi-window apps, they may get multiple zone rectangles. I expect this to be useful for ultrawide or other extremely large displays (and potentially for tilers as well, although those might simply not want to implement this protocol at all).
Infinite Zone Case: In case we have a scrolling compositor, we may have a finite height but infinite width (or vice versa). In that case, the compositor can communicate that fact to the client by leaving the zone open in one direction. That way, the application can stretch out a bit and use the extra space if it needs it. Of course, such a compositor may also just restrict the zone to 1x the current monitor geometry instead. In every case, the client can not expect a position request to be followed exactly, so the compositor is allowed to reign in protocol abuse the an app placing a window insanely far away from the user's current position.
Window movement "edge cases"
In the first example, the user moves a window out of the current monitor onto a second monitor, but the application still wants to position other windows relative to it / needs it in a zone. For that to happen, the following steps happen:
- The compositor emits a
zone_left
event to notify the client that a window has left its assigned zone and the zone association is broken - The client could ignore this if it does not care about positioning anymore, but in this case it creates a new zone for the respective output via
get_zone
- If it received a valid zone, it associates the just moved window with the zone on the second monitor via
get_position
. This will also give it the position of the moved window relative to its new zone.
In the second example a window is moved out of the zone on the same monitor, but the client still wants to know the window's position. In this case, the client can request get_position
using its zone on the output and thereby make the compositor extend the zone to once again encompass all of the application's windows again.
If the workspace can not be extended (if it already hits the top-left window border and the window is moved outside of that boundary by the user), the positions returned by position
events in response to get_position
might be negative.
Advantages
The advantages of this protocol are:
- No global coordinate system
- Multi-process GUI applications can easily cooperate in window placement across their processes
- No limitations on the window layouts applications can come up with (Clients can easily construct their initial window layout and do not require the compositor to make assumptions about it)
- Relatively easy to port existing applications to the new protocol from Windows/macOS/X11
- Context for compositors: They now know the explicit layout of windows a client has created, and that these windows belong together, so can decide to e.g. allow the user to move them as a cluster between virtual zones, or represent them as one in a tiling WM and potentially only expand them if selected.
- Clients have a better idea about the usable space available to them and might make much better placement decisions than on X11
- Still, compositors have the final say about window placement and application hints are only strong recommendations
Disadvantages
- The protocol is a lot more complex now, and the compositor will have to juggle more coordinate systems
- If the user moves a window outside of the compositor's selected zone bounds, the compositor needs to adapt the zone size, so the client does not receive invalid window locations and so the zone still encompasses all the client's windows. This means every single window in the zone changes is position (from the client's POV) if the window was moved out of the left or top side of the zone, which is a bit messy. But applications should be able to handle this, as they already have to deal with similar cases on X11.
- Within a zone, the compositor still does not have a lot of context of what the windows actually do - but I do not think there is any viable scalable solution to convey that information.
Please let me know what you think! I am aware that this protocol will be just as controversial as the other one, but I do think this one would be a better compromise than the previous relative alignment approach.
FAQ
Why do we need no allow clients to position their windows / setting layer rules?
Very briefly:
- Many professional and multi-screen applications are built around having a way to position their windows for good usability (see !247 (closed) for a few examples). Those apps work on Windows, MacOS and X11, but do not work well or at all on Wayland. New apps like these are also being created all the time.
- Toolkit use / portability: API like this is available on all desktop platforms, except Wayland. For certain apps this adds an extreme burden for porting their apps to Linux/Wayland which many do not want to do. Cross-platform toolkits are also in a rough place, because Wayland is the "odd one out" API that does not support an otherwise common interface.
- Web compatibility: For professional usecases, there is a W3C draft to allow web apps to position windows (with appropriate permission control). See https://w3c.github.io/window-management/
- Compatibility with WINE and other emulators: Since API like this exists on all desktop platforms except Wayland, translation/compatibility layers like WINE can not create matching behavior on Wayland without compositor support. This is a problem primarily for larger Windows apps running through WINE, but can also affect some games.
- Professional apps like DAWs rely a lot on having windows "always on top" floating above the other UI elements. On Wayland, this is not possible and the usability of these apps suffers a lot.
Why not implement a screen-relative, absolute coordinate system for full compatibility with X11/Windows/MacOS?
This was attempted at !247 (closed), but in brief, many compositors do not want to expose a global coordinate system relative to the connected displays. It also does not work with specialty compositors, such as XR/VR compositors and compositors providing an infinite surface. Ideally, a Wayland protocol should be a bit more versatile than the X11 solution was.
Why not create a protocol for semantic relative positioning hints (no coordinates, just top/left/right/middle relative hints)?
This was explored in !249 (closed) with a few prototypes. There were three issues with it:
- Fully semantic positioning does not solve all application cases: Apps do sometimes want to display a window next to a UI element or at a specific position to their UI. So we need to give them relative coordinates to other windows at least.
- Toolkit developers and app developers hated it, as it creates a complete special-case that is unlike anything on other platforms and needs to be specifically supported by toolkits somehow. It not being able to cover all existing, in-use usecases didn't make it particularly loved.
- It does not work for emulators and tools like WINE at all, for DAWs and layering it had no solution that couldn't be abused to trick users and it would not have worked with the W3C draft either.
Why not create a protocol for toplevel-relative positioning hints (with coordinates, but relative to other toplevels the app controls)?
This was also considered while trying to salvage !249 (closed) and was discarded for these reasons:
- It is still an interface that isn't cleanly mappable for cross-platform toolkits, although it's easier to implement than semantic positioning
- It doe not work with tools like WINE as well
- It is a fairly complex protocol and...
- ...clients can easily abuse such a protocol to get fully absolute positioning anyway: Create a window, maximize it, place all other windows relative to it, then destroy the backing window. Done. All windows are now placed in absolute coordinates, and with some background math, knowing the positions of windows in relation to each other, the app can even absolutely place newer windows without measuring the available display region again. So, it's a complex protocol that is easily fooled.
ext-zones
resolves this in a way involving less hacks.
Can't we rewrite all apps making use of window positioning as MDI apps?
Besides this being a gargantuan task for client developers (change the whole UX!) that they will only have to do for Wayland and no other platform, it is also no solution. MDI has multiple issues (see https://stackoverflow.com/questions/486020/is-there-still-a-place-for-mdi for just one summary), paramount being it not working well with multiple monitors. Usually apps using window placement are designed for larger or multiple monitors.
Implementing "magnetic windows" with this protocol looks annoying, with lots of polling...
For this particular case, it is a lot better to tell the compositor about the two windows being docked together so it can perform the appropriate actions. Ideas for a new protocol specifically for this are being discussed here: #198
Is there a simple demo application to play with multi-window patterns?
Most of the apps using this are really large and there isn't one that just uses all multi-window patterns at once, so I created a small demo tool at https://github.com/ximion/multiwindow-pattern-demo for experimentation purposes.
This protocol is being discussed in a vacuum, isn't there any real-life testing?
We have a prototype implementation for KWin available at https://invent.kde.org/apol/kwin-zones as a plugin. The protocol is also already used by Mercedes for internal development on their IVI systems (surprisingly...) as well as by one other company for testing with a research device control interface (unfortunately I can't share more).
Currently, direct toolkit implementations are a bit of a pending issue, as, depending on the toolkit, they involve a bit of work on the abstraction layer if the toolkit's current design is screen-centric, so some implementation and integration work for Wayland and this protocol is still needed. That work should be doable though (to be proven! ^^). There is also of course hesitation to do the work until major compositors indicate they will support the protocol into the future.
Will this work in XR/VR or with compositors that have no coordinate system to share?
Yes! The protocol makes the compositor share a rectangle in which the client can place windows, which apparently maps well to XR compositors. Other compositors can also provide these zones, without needing to expose any global coordinate system.
How will this work for autotiling compositors?
Tiling compositors take full control of window management, while this protocol allows clients to advise window management. These two concepts do not work that well together. Depending on their policy and individual implementation, tiling compositors have the following options if this protocol gets merged:
- Not implement ext-zones: This way, clients have a clear indication that positioning and layering hints are not supported on the current compositor.
- Implement ext-zones, but reject all positioning hints and only deal with layering hints: Less clear for clients, but they will still deal with a scheme like this. The compositor can get some information about a client's intent this way.
- Implement ext-zones and use the positioning hints from the client as advisory information to arrange windows in the tiling grid.
- Implement ext-zones and give the client in question the geometry of a tile to place its windows in, effectively tiling the zone instead of the window. That way, both the client and the compositor can meet half way.
- Implement ext-zones, but as soon as a client has more than one window in a zone and has made a placement request, offer the user to switch that client to stacking mode or simply switch the client's windows to stacking mode automatically for compatibility.
- For compositors which can switch between stacking & autotiling: Set compositor-defined positions and layouts in tiling mode, but keep the application's zone preferences in mind. When switching back to stacking mode, restore the original positions set by the client/user.
Any one of these options is perfectly valid. It is known that any client-side position hints clash a bit with the concept of an autotiling WM, and that is okay. The protocol is not forced upon compositors that do not want to implement it because it would not make sense for their design.
What's the scaling of a zone?
A zone uses logical pixels / the same coordinate system as the client's surfaces. Once created, the zone coordinate system can not be changed, if the client adds toplevels using a different coordinate system (or lets a client join the zone that uses different coordinates) it is considered an application bug, or the client has to work around it.
Are position hints mandatory?
No, they are advisory. The compositor can override a client positioning hint at any time if it conflicts with compositor policy, and the client has to deal with the result. Of course, in most cases positioning requests should succeed, as the compositor-provided zone should be an area where the compositor generally allows placement of windows.
It is generally disallowed to move a window that is interacted with (being dragged/resized by the user).
Are layering hints mandatory?
Yes, within reason. All layer hints only apply to the current zone (and the items in it), and the compositor may ignore a layering hint in case it does not make sense for the window management style it uses (e.g. for a tiling compositor, there are no layers / overlapping windows usually).
Requirements for merging
-
Review (many, but formal LGTM still missing) -
Implementations (2+) - Qt: TBD
- KWin: TBD, prototype in progress @ https://invent.kde.org/apol/kwin-zones
- SDL: in progress @ https://github.com/Kontrabant/SDL/tree/wl_ext_zones
-
ACKs from members (2+) - TBD