ControLling mUltiple streams for tElepresence (clue) ---------------------------------------------------- Charter Last Modified: 2011-01-11 Current Status: Active Working Group Chair(s): Mary Barnes <mary.barnes@polycom.com> Paul Kyzivat <pkyzivat@cisco.com> Real-time Applications and Infrastructure Area Director(s): Gonzalo Camarillo <gonzalo.camarillo@ericsson.com> Robert Sparks <rjsparks@nostrum.com> Real-time Applications and Infrastructure Area Advisor: Gonzalo Camarillo <gonzalo.camarillo@ericsson.com> Mailing Lists: General Discussion:clue@ietf.org To Subscribe: https://www.ietf.org/mailman/listinfo/clue Archive: http://www.ietf.org/mail-archive/web/clue/ Description of Working Group: In the context of this WG, the term telepresence is used in a general manner to describe systems that provide high definition, high quality audio/video enabling a "being-there" experience. One example is an immersive telepresence system using specially designed and special purpose rooms with multiple displays permitting life size image reproduction using multiple cameras, encoders, decoders, microphones and loudspeakers. Current telepresence systems are based on open standards such as RTP, SIP, H.264, the H.323 suite. However, they cannot easily interoperate with each other without operator assistance and expensive additional equipment which translates from one vendor to another. A major factor limiting the interoperability of telepresence systems is the lack of a standardized way to describe and negotiate the use of the multiple streams of audio and video comprising the media flows. The WG will create specifications for SIP-based conferencing systems to enable communication of information about media streams so that a sending system, receiving system, or intermediate system can make reasonable decisions about transmitting, selecting, and rendering media streams. This enables systems to make choices that optimize user experience. This working group is chartered to specify the following information about media streams from one entity to another entity: * Spatial relationships of cameras, displays, microphones, and loudspeakers - relative to each other and to likely positions of participants * Viewpoint, field of view/capture for camera/microphone/display/loudspeaker - so that senders and intermediate devices can understand how best to compose streams for receivers, and the receiver will know the characteristics of its received streams * Usage of the stream, for example whether the stream is presentation, or document camera output * Aspect ratio of cameras and displays * Which sources a receiver wants to receive. For example, it might want the source for the left camera, or might want the source chosen by VAD (Voice Activity Detection) Information between sources and sinks about media stream capabilities will be exchanged. The working group will define the semantics, syntax, and transport mechanism for communicating the necessary information. It will consider whether existing protocols for signaling, messaging and transport are adequate or need to be extended. Any extensions to IETF protocols will be done in appropriate WGs, for example extensions to SDP in MMUSIC. The scope of the work includes describing relatively static relations between entities (participants and devices). It also includes handling more dynamic relationships, such as specifying the audio and video streams for defined speakers. Specifying the location of the current speakers relative to display microphones needs to be provided dynamically as speakers move. As part of the receiver telling the sender what it wants dynamically, explicit receiver notification to the sender of the desired video stream and video pause will be considered. The scope includes both systems that provide a fully immersive experience, and systems that interwork with them and therefore need to understand the same multiple stream semantics. The focus of this work is on multiple RTP audio and video streams. Other media types may be considered, however development of methodologies for them is not within the scope of this work. Interoperation with SIP and related standards for audio and video is required. However, backwards compatibility with existing non-standards compliant telepresence systems is not required. This working group is not currently chartered to work on issues of continuous conference control including: far end camera control, floor control, conference roster. The working group may identify interoperability obstacles in existing open standards. If so, the WG will develop requirements to be communicated to other IETF WGs or Standards Forums, or recharter as appropriate. Reuse of existing protocols and backwards compatibility with SIP-compliant audio/video endpoints are important factors for the working group to consider. The work will closely coordinate with the appropriate areas (e.g., OPS and SEC), and working groups including AVT, MMUSIC, MEDIACTRL, XCON, and SIPCORE. Goals and Milestones: Jul 2011 Submit informational draft to IESG on use cases Jul 2011 Submit informational draft to IESG on framework and requirements Nov 2011 Submit standards track specification(s) to IESG to support framework and requirements Internet-Drafts: Posted Revised I-D Title <Filename> ------ ------- -------------------------------------------- Jun 2011 Jun 2011 <draft-ietf-clue-telepresence-use-cases-00.txt> Use Cases for Telepresence Multi-streams Request For Comments: None to date.