Abstract

This document defines a new API {{MediaDevices/getAllScreensMedia}} for capturing multiple monitors following one user gesture. It is an extension to the Screen Capture API [[screen-capture]].

This document is a draft. It is subject to major changes and, while early experimentations are encouraged, it is therefore not intended for implementation.

Introduction

The Screen Capture API [[screen-capture]] enables the capturing of a single display surface in the form of a video track.

Users currently cannot capture all monitors attached to a device at once without having to execute a two step sequence multiple times, i.e. an interaction with the page through a [=transient activation=] that calls getDisplayMedia and a subsequent interaction with the media picker. This is a cumbersome process and does not guarantee that all monitors are captured (which may be required by regulations).

This document describes {{MediaDevices/getAllScreensMedia}}, an extension to the Screen Capture API [[screen-capture]]. {{MediaDevices/getAllScreensMedia}} enables the capturing of several of the user's monitors, or parts thereof, without a [=transient activation=]. It returns a list of media streams (each containing a track corresponding to the captured monitor). As no [=transient activation=] is required, the user agent must ensure that appropriate protection, e.g. by allowlisting permitted origin.

This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.

Implementations that use ECMAScript [[ECMA-262]] to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [[!WEBIDL]], as this specification uses that specification and terminology.

Example

The following example demonstrates a request for all monitors using the navigator.mediaDevices.getAllScreensMedia method defined in this document.

      try {
        const mediaStreams = await navigator.mediaDevices.getAllScreensMedia();
        mediaStreams.forEach((mediaStream, index) => {
          files.push(saveToFile(mediaStream));
        })
      } catch (e) {
        console.log('Unable to acquire screen captures: ' + e);
      }
    

Terminology

This document uses the definition of {{MediaStream}}, {{MediaStreamTrack}} from [[!GETUSERMEDIA]], origin from [[!url]], {{ScreenDetailed}} from [[!window-management]], and monitor from [[!screen-capture]].

Capturing multiple monitors

Capture of all monitors is enabled through the addition of a new {{MediaDevices/getAllScreensMedia}} method on the {{MediaDevices}} interface.

MediaDevices Additions

          partial interface MediaDevices {
            [Exposed=(Window), IsolatedContext]
            Promise<sequence<MediaStream>> getAllScreensMedia();
          };
        
getAllScreensMedia

When the {{MediaDevices/getAllScreensMedia()}} method is called, the user agent MUST run the following steps:

  1. Let p be a new promise.

  2. If [=this=]'s [=relevant global object=]'s [=associated Document=] is not [=allowed to use=] the [=policy-controlled feature=] named "all-screens-capture", [=reject=] p with a new {{DOMException}} object whose {{DOMException/name}} attribute has the value {{NotAllowedError}} and return p.
  3. Run the following steps in parallel:

    1. The user agent MUST obtain permission by checking an allowlist of origins specified by an administrator or device owner. If the origin is not in the allowlist defined by the administrator, [=reject=] p with a new {{DOMException}} object whose {{DOMException/name}} attribute has the value {{NotAllowedError}} and return p.

    2. Optionally, due to platform limitations, [=reject=] p with a new {{DOMException}} object whose {{DOMException/name}} attribute has the value {{NotAllowedError}}.

    3. Enumerate all monitors, resulting in a set monitorsMedia of media to capture.

      Each provided media in monitorsMedia MUST include precisely one video track. Once selected, the source of each {{MediaDevices/ScreenCaptureMediaStreamTrack}} MUST NOT change.

      Each provided media in monitorsMedia MUST not include any audio track.

      If a hardware error such as an OS/program/webpage lock prevents access to at least one device, reject p with a new {{DOMException}} object whose {{DOMException/name}} attribute has the value {{NotReadableError}} and abort these steps.

      For each device that is sourcing the selected medias in monitorsMedia, using a stable and private id for the device, deviceId, set [[\devicesLiveMap]][deviceId] to true, if it isn’t already true, and set the [[\devicesAccessibleMap]][deviceId] to true, if it isn’t already true.

      If device access fails for any reason other than those listed above, reject p with a new {{DOMException}} object whose {{DOMException/name}} attribute has the value {{AbortError}} and abort these steps.

    4. Let streams be the list of {{MediaStream}} objects for which the permission was granted.

    5. Resolve p with streams and abort these steps.

  4. Return p.

ScreenCaptureMediaStreamTrack

{{MediaDevices/ScreenCaptureMediaStreamTrack}} extends {{MediaStreamTrack}} and corresponds to a monitor. It additionally provides a connection to the window placement API [[!window-management]] with the {{MediaDevices/ScreenCaptureMediaStreamTrack/screenDetailed}} function with which metadata about the captured monitor can be retrieved.
        [Exposed=Window, IsolatedContext]
        interface ScreenCaptureMediaStreamTrack : MediaStreamTrack {
          ScreenDetailed screenDetailed();
        };
      
screenDetailed
The {{screenDetailed()}} MUST return a {{ScreenDetailed}} object [[!window-management]] that corresponds to the monitor that is captured by the {{ScreenCaptureMediaStreamTrack}} object.

Permissions Policy Integration

This specification uses a [=policy-controlled feature=] identified by the string "all-screens-capture". Its [=policy-controlled feature/default allowlist=] is "self".

A [=document=]'s [=Document/permissions policy=] determines whether any content in that document is allowed to use {{MediaDevices/getAllScreensMedia}}. If disabled in any document, no content in the document will be [=allowed to use=] {{MediaDevices/getAllScreensMedia}}.

Privacy & Security Considerations

Privacy Considerations & Usage Rndicator Requirements

References in this specification to [[\devicesLiveMap]], and [[\devicesAccessibleMap]] refer to the definitions already created to support Privacy Indicator Requirements for {{MediaDevices/getDisplayMedia()}}.

This specification extends the Privacy Indicator Requirements of {{MediaDevices/getDisplayMedia()}} to include {{MediaDevices/getAllScreensMedia()}}. In addition to these requirements, user agents MUST ensure that privacy indicators are visible at all times and that dismissal of the indicators must not be persisted. The privacy indicators must inform about the origin of the application capturing the screens. The indicators MUST clearly inform the user that the monitors are captured. Only the user may dismiss the privacy indicator. The indicators MUST remain active for at least five seconds, even if the capturing is ended earlier to prevent applications from capturing the screens without the user noticing.

The user agents MUST provide the user with the means to look up whether any origin is allowed to call {{MediaDevices/getAllScreensMedia()}}. The user agents MUST further provide the user with information on the implications thereof.

The user agents MUST notify the user that capturing may happen in the future if {{MediaDevices/getAllScreensMedia()}} is enabled. The notification must be shown before sensitive browser content can be exposed, e.g. on user login. A user agent MUST ensure that the administrator changes the allow-list while the user is viewing senstive browser content.

This section is non-normative.

Security Considerations

This sections discusses the major threats and mitigations.

Threat: Cross-site scripting

Attackers might use cross-site scripting to get access to sensitive information by using elevated permissions of the allowlisted apps.
Mitigation
The API is exposed in isolated contexts (i.e. in isolated web apps) only. Isolated web apps are intended to mitigate client-side cross-site scripting attacks by enforcing strict Content-Security-Policy and {{TrustedType}} and server-side cross-site scripting attacks by bundling and signing of the app.

Threat: Violation of organization policies

Use of the API may violate organization policies, that control which apps should have access to sensitive information.
Mitigation
User agents must restrict the use of the API based on allowlists defined by the organization's administrator.

Threat: Third-party iframes initiating screen capture

Third party iframes might initiate screen capture.
Mitigation
The "all-screens-capture" permissions policy will control access, preventing third-party use by default. To further safeguard from potential third-party attacks, isolated web apps employ a strict Content-Security-Policy that makes using external resources (i.e. the ones not originating from the Web Bundle itself) difficult and enforce cross-origin-isolation.