WebXR's Game of the Year targets Apple Vision Pro’s gaze-and-pinch

While working to support eye and hand tracking on Vision Pro from the same

, technical director James C. Kane explores issues around design, tech and privacy. While working to support eye and hand tracking on Vision Pro from the same URL, technical director James C. Kane explores issues around design, tech and privacy.

There are pros and cons to Vision Pro’s total reliance on gesture and eye-tracking, but when it works in context, gaze-based input feels like the future we’ve been promised — like a computer is reading your mind in real-time to give you what you want. This first iteration is imperfect given the many valid privacy concerns in play, but Apple has clearly set a new standard for immersive user experience going forward.

They’ve even brought this innovation to the web in the form of Safari’s transient-pointer input. Apple’s announcement of beta support for cross-platform WebXR specifications on Vision Pro raised some eyebrows, as the modern browser has many features. While some limitations remain, instant global distribution of immersive experiences without App Store curation or fees makes this an incredibly powerful, interesting and accessible platform.

We’ve been exploring this channel for years at Paradowski Creative. Our agency builds immersive experiences, including Sesame Street for global brands, and we invest in original award-winning content. The Escape Artistadidas, our VR escape game, allows you to play as a muse trapped inside an artist’s work. You must solve puzzles and decipher hints to escape and find inspiration. The game is Verizon and has been downloaded by over a quarter million people from 168 different countries. It was also .We designed the majority of our game around Meta Quest physical controllers, as we were already halfway through development when Vision Pro came out. WebXR is a powerful tool that can be used across platforms and input modes. We quickly realized our idea would work in both. Now that Apple has improved documentation for its new eye-tracking input system, we should revisit our work. What is the gaze? It could be used for a number of things. What could it be good for? Let’s see.featured on the homepage of every Meta Quest headset in the worldPrior Art and DocumentationPeople’s Voice Winner for Best Narrative Experience at this year’s Webby AwardsAda Rose Cannon and Brandel Zachernuk of Apple recently

explaining how this input works, including its event lifecycle and key privacy-preserving implementation details. The post features a debut beta support for hand tracking on Apple Vision Pro within weeks of its launch which deserves examination, as well.

https://www.youtube.com/watch?v=9VQTxt39cNA

Uniquely, this input provides a gaze vector on published a blog postselectstartthree.js example — that is, transform data that can be used to draw a line from between the user’s eyes toward the object they’re looking at the moment the pinch gesture is recognized. Applications can then determine what the user’s focus is in context. Apple, rightfully, views this data as extremely personal and should be treated by developers with great care. In order to address these concerns eyeline data will only be exposed in a single frame

. It should stop malicious developers from abusing the system, but it will also prevent any highlight effects based on gaze prior to selecting an object. Apple will eventually move in this direction for the web, too, but until then, it is a limitation. We believe Apple will eventually move in this direction for the web, as well, but for now this remains a limitation.

Despite that, using transient-pointer input for object selection can make sense when the selectable objects occupy enough of your field of view to avoid frustrating near-misses. But as a test case, I’d like to try developing another primary mechanic — and I’ve noticed a consistent issue with Vision Pro content that we may help address.

Locomotion: A Problem StatementFew if any experiences on Vision Pro allow for full freedom of motion and self-directed movement through virtual space. Apple’s advertising emphasizes experiences that are seated and stationary, with many taking place in 2D, flat windows. Even in fully-immersive apps, at best you get linear hotspot-based teleportation, without a true sense of agency or exploration.https://www.youtube.com/watch?v=8Z8W7vqxIv8

A major reason for this is the lack of consistent and compelling teleport mechanics for content that relies on hand tracking. The “laser-pointer” method of teleportation is widely accepted and accurate when using controllers. Apple is not making any handheld devices for the time being, so designers and developers must re-calibrate. This design would allow for a 1:1 replica of the controller-like cursor that extends from the wrist, but it has both practical and technical issues. From the view of the Vision Pro camera, this image shows an easily recognizable pinch gesture. Raycasting would be possible from the wrist position shown in the picture. On the other hand, as soon as you straighten your elbow a couple of degrees and point down, it becomes virtually unrecognizable by camera sensors. The user’s chair also acts as an obstruction to the straight-down gesture. This means that, on their own, hand tracking, gesture detection, and wrist vector do not provide a robust solution to teleportation. But I hypothesize we can utilize Apple’s new transient-pointer input — based on both gaze

and

subtle hand movement — to design a teachable, intuitive teleport mechanic for our game.

A Minimal Reproduction

While Apple’s lone

transient-pointer

example is written in three.js, our game is made in , a more Unity-like editor and entity-component system designed for the web, whose scripting relies on . So some conversion of input logic will be needed.As mentioned, our game already supports hand tracking — but notably, transient-pointer

inputs are technically separate from hand tracking

and do not require any special browser permissions. Vision Pro’s “full hand tracking” only reports joint locations for rendering. It does not have any built-in features unless developers add them (i.e. If hand tracking is enabled, a transient-pointerWonderland Engine will file in line behind these more persistent inputs. If hand tracking the glMatrix math libraryis

enabled, a transient-pointer will file in line behind these more persistent inputs.So, with that in mind, my minimal reproduction should:Listen for selectstart events and check for inputs with a targetRayMode

set to

transient-pointerPass those inputs to the XR session's to return an describing the input's orientationConvert values from reference space to world space (shoutout to perennial Paradowski all-star Ethan Michalicek for realizing this, thereby preserving my sanity for another day)
Raycast from the world space transform to a navmesh collision layergetPose()Spawn teleport reticle, translating its X and Z position to the input’s XRPosegripSpace
until the user releases pinchXRRigidTransformTeleport the user to location of the reticle on
selectend
The gist at the end of this post contains most of the relevant code, but in a greenfield demo project, this is relatively straight-forward to implement:https://www.youtube.com/watch?v=vSfiQ7biKiMIn this default scene, I’m moving a sphere around the floor plane with “gaze-and-pinch,” not actually teleporting. Despite the lack of highlight effect or any preview visual, the eyeline raycasts are quite accurate — in addition to general navigation and UI, you can imagine other uses like a game where you play as Cyclops from X-Men, or eye-based puzzle mechanics, or searching for clues in a detective game.
To test further how this will look and feel in context, I want to see this mechanic in the final studio scene from The Escape Artist

Even in this MVP state, this feels great and addresses several problems with “laser pointer” teleport mechanics. We don’t rely on the controller orientation or wrist position at all. This allows us to maintain an arm and elbow pose which is both comfortable for players and easily recognized by camera sensors. It’s easy to learn, too. The player only needs to slightly move, yet they can still make subtle changes directly under their feet. We’ll add tutorial steps, but it is likely that anyone who has successfully navigated to our game on Vision Pro will already be familiar with gaze-and-pinch mechanics. We’ll add tutorial steps nonetheless, but it’s likely anyone who has successfully navigated to our game on Vision Pro is necessarily already familiar with gaze-and-pinch mechanics.

Even for the uninitiated, this quickly starts to feel like mind control — it’s clear Apple has uncovered a key new element to spatial user experience.Next Steps and TakeawaysDeveloping a minimum viable reproduction of this feature is perhaps less than half the battle. This feature will need to be integrated into the game’s logic. We also have to test it across multiple headsets, as well as different input methods. In addition, some follow up tests for UX might be:

Applying a noise filter to the hand motion on the teleport reticule — it's still a bit jittery.

Using normalized rotation delta of the wrist Z-axis (i.e. turning a key) to drive teleport rotation.

Determine if a more sensible cancel gesture is needed. It’s possible to point away from the navigation mesh, but could there be an easier way? Meta’s operating system is likely to adopt similar features in future products. And indeed,

transient-pointer

support is already behind a feature flag on the Quest 3 browser, albeit an implementation without actual eye tracking.

In my view, both companies need to be more liberal in letting users grant permissions to give full camera access and gaze data to trustworthy developers. Even now, when used in the correct context, eye tracking input can be incredibly helpful, natural and magical. Developers would be wise to begin experimenting with it immediately.

Code: