Abstract

This document specifies a mechanism for exposing to a screen-capturing application mouse events which occur over a captured [=display surface=].

Background

Web applications can use {{MediaDevices/getDisplayMedia()}} to capture any [=display surface=] - tabs, windows or screens. When they do, they can also specify the cursor constraint to control whether the cursor's pixels are captured or not.

But what if the application wishes to programmatically observe the location of the cursor? That can be done by scanning each frame and employing heuristics to detect the cursor. But that's neither simple, nor efficient, nor robust.

A mechanism for exposing mouse coordinates over a captured surface to a capturing application is desirable.

Use cases

Use case #1: Cursor enhancement

During video-conferencing calls, Web applications on the receiving side can use our mechanism to highlight, enhance or outright replace the cursor. This can be done without making such distracting adjustments on the capturing side, where the presenter might not need such aids.

This same can be done when recording an instructional video.

Use case #2: Efficiency enhancements during RTC

Captured frames are often encoded and then transmitted remotely. Updates to the cursor location require a new frame to be encoded and transmitted, costing CPU, power and bandwidth. In theory, these costs could be minimized by efficient encoding; in practice, the costs are non-negligible - the entire image needs to be scanned for changes, and there is per-frame IPC overhead which multiplies with the max framerate.

The cursor constraint can be used to omit the cursor from the captured frames. If mouse coordinates were known to the encoding application, it could transmit these coordinates to the receiving application (on another device). The receiver could then redraw a cursor on the other side. (Note that, depending on the decoder, this is likely cheaper than decoding a whole new frame.) Non-trivial complexity would be added, but the gains to CPU, power and bandwidth on both sides might justify it.

CapturedMouseEvent interface

We define a new event type, {{CapturedMouseEvent}}. This is modelled after {{MouseEvent}} from [[UIEVENTS]], but exposes less information. Whereas {{MouseEvent}} exposes to an application the user's interaction with the application itself, and is therefore straightforward security-wise, {{CapturedMouseEvent}} exposes to an application the user's interaction with another application, or even with the operating system itself, calling for more scrutiny in which information may safely be exposed. At present, only information already available to the capturing application is exposed - the mouse coordinates.

If a capture-session has an associated {{CaptureController}} whose `[[Source]]` is not ended then the user age MUST regularly [=fire an event=] named capturedmousechange at the {{CaptureController}} using {{CapturedMouseEvent}}, with its {{Event/bubbles}} and {{Event/cancelable}} attributes set to `false`, to indicate the position of the captured cursor within the [=display surface=] associated with `CaptureController.[[Source]]`, or the cursor's departure from that surface.

The user agent MUST NOT [=fire an event=] if the event immediately preceding it had the same values in all of its fields. (That is to say, two identical events will not be consecutively fired.)

User agents MAY limit the frequency with which events are fired. This can be achieved by briefly buffering events, then skipping the events for some states and reporting only the latest one. User agents SHOULD only buffer for very short periods of time.

The user agent MUST NOT fire after the original video track and all its clones have been stopped.

        [Exposed=Window]
        interface CapturedMouseEvent : Event {
          constructor(DOMString type, optional CapturedMouseEventInit eventInitDict = {});
          readonly attribute long surfaceX;
          readonly attribute long surfaceY;
        };
      
constructor()

Constructs a new {{CapturedMouseEvent}}.

The arguments are passed as is to {{Event}}'s constructor.

If any of {{CapturedMouseEventInit.surfaceX}} or {{CapturedMouseEventInit.surfaceY}} is negative, and they are not both equal to -1, then the constructor throws a RangeError exception.

surfaceX

The horizontal coordinate at which the event occurred relative to the origin of the captured [=display surface=].

The only legal negative value is -1. A combination of {{CapturedMouseEvent/surfaceX}} and {{CapturedMouseEvent/surfaceY}} both being set to -1 indicates that the mouse cursor is not over the captured surface.

surfaceY

The vertical coordinate at which the event occurred relative to the origin of the captured [=display surface=].

The only legal negative value is -1. A combination of {{CapturedMouseEvent/surfaceX}} and {{CapturedMouseEvent/surfaceY}} both being set to -1 indicates that the mouse cursor is not over the captured surface.

CapturedMouseEventInit dictionary

        dictionary CapturedMouseEventInit : EventInit {
          long surfaceX = -1;
          long surfaceY = -1;
        };
      
surfaceX

Initializes the {{CapturedMouseEvent/surfaceX}} attribute of the {{CapturedMouseEvent}} object to the desired horizontal relative position of the mouse pointer on the user's screen.

surfaceY

Initializes the {{CapturedMouseEvent/surfaceY}} attribute of the {{CapturedMouseEvent}} object to the desired vertical relative position of the mouse pointer on the user's screen.

CaptureController Extensions

We extend {{CaptureController}} to enable developers to listen to {{CapturedMouseEvent}} events dispatched during a capture-session:

        partial interface CaptureController {
          attribute EventHandler oncapturedmousechange;
        };
      
oncapturedmousechange of type {{EventHandler}}

The event type of this event is {{capturedmousechange}}

Examples

In the basic example below, a {{CaptureController}} is passed to {{MediaDevices/getDisplayMedia()}}. An event handler is set on that object in order to receive {{CapturedMouseEvent}}s with the mouse coordinates over the captured surface.

        try {
          const controller = new CaptureController();
          controller.oncapturedmousechange = (event) => {
            console.log(`Mouse coordinates: x=${event.surfaceX}, y=${event.surfaceY}`);
          };
          let mediaStream = await navigator.mediaDevices.getDisplayMedia({ controller });
        } catch (e) {
          console.log(`Unable to acquire screen capture: ${e}`);
        }
      

In the following example, the cursor constraint is used to omit the cursor from the captured display surface. The mouse coordinates are transmitted via a RTCDataChannel to the receiving application, which can then redraw the cursor.

        // An peer-to-peer connection is established and a data channel created for
        // transmitting the mouse events.
        const configuration = {iceServers: [{urls: 'stun:stun.example.org'}]};
        const pc = new RTCPeerConnection(configuration);
        const channel = pc.createDataChannel('mouse-events', {negotiated: true, id: 0});

        // On the capturing side, the capture session is initialized with the mouse
        // cursor ommitted from the captured surface. The mouse coordinates are
        // transmitted via the data channel.
        const controller = new CaptureController();
        controller.oncapturedmousechange = (event) => {
          channel.send(JSON.stringify({x: event.surfaceX, y: event.surfaceY}));
        };
        const mediaStream = await navigator.mediaDevices.getDisplayMedia({
          video: { cursor: "never" },
          controller: controller,
        });
        pc.addTrack(mediaStream.getVideoTracks()[0], mediaStream);
        ...

        // On the receiving side, the remote stream is rendered in a video and the
        // coordinates received from the data channel are used to redraw the cursor.
        const remoteView = document.getElementById('my-video-element');
        channel.onmessage = ({data}) => redrawCursor(removeView, JSON.parse(data));
        pc.ontrack = ({track, streams}) => {
          track.onunmute = () => {
            if (remoteView.srcObject) return;
            remoteView.srcObject = streams[0];
          };
        };
        ...
      

Privacy and Security Considerations

The mechanisms introduced by this specification are only available to an application which has called {{MediaDevices/getDisplayMedia()}}, and where the user selected to capture some [=display surface=] S. This means that the capturing application already has access to all of the pixels visible on S.

If the capturing application so chooses, it can include the cursor among those pixels by specifying the {{CursorCaptureConstraint}} {{CursorCaptureConstraint/"always"}}. This can either be done when calling {{MediaDevices/getDisplayMedia()}}, or through {{MediaStreamTrack/applyConstraints()}}. If {{MediaStreamTrack/applyConstraints()}} is used, the cursor constraint might even be applied to a clone of the original track.

It follows that the mechanisms currently included in this specification do NOT change what information is available to a capturing application. These mechanisms only make access to this information cheaper and more reliable, by removing the need for the capturing application to heuristically scan for the cursor's position in the frames.

When implementing this feature, care must be taken so that the cursor's location would not be exposed in situations where it would not be drawn in the frame. For example, if the captured surface is an inactive native window, and the user moves the cursor across the window while moving between two other applications, then the cursor's location should only be exposed via events, if the cursor is also drawn into the frame (or would have been drawn if {{CursorCaptureConstraint/"always"}} were specified as the cursor constraint).