Videoconferencing

Updated on Sep 25, 2024

Edit

Comment

Videoconferencing (VC) is the conduct of a videoconference (also known as a video conference or videoteleconference) by a set of telecommunication technologies which allow two or more locations to communicate by simultaneous two-way video and audio transmissions. It has also been called 'visual collaboration' and is a type of groupware.

Videoconferencing differs from videophone calls in that it's designed to serve a conference or multiple locations rather than individuals. It is an intermediate form of videotelephony, first used commercially in Germany during the late-1930s and later in the United States during the early 1970s as part of AT&T's development of Picturephone technology.

With the introduction of relatively low cost, high capacity broadband telecommunication services in the late 1990s, coupled with powerful computing processors and video compression techniques, videoconferencing has made significant inroads in business, education, medicine and media.

Technology

The core technology used in a videoconferencing system is digital compression of audio and video streams in real time. The hardware or software that performs compression is called a codec (coder/decoder). Compression rates of up to 1:500 can be achieved. The resulting digital stream of 1s and 0s is subdivided into labeled packets, which are then transmitted through a digital network of some kind (usually ISDN or IP). The use of audio modems in the transmission line allow for the use of POTS, or the Plain Old Telephone System, in some low-speed applications, such as videotelephony, because they convert the digital pulses to/from analog waves in the audio spectrum range.

The other components required for a videoconferencing system include:

Video input: (PTZ / 360° / Fisheye) video camera or webcam

Video output: computer monitor, television or projector

Audio input: microphones, CD/DVD player, cassette player, or any other source of PreAmp audio outlet.

Audio output: usually loudspeakers associated with the display device or telephone

Data transfer: analog or digital telephone network, LAN or Internet

Computer: a data processing unit that ties together the other components, does the compressing and decompressing, and initiates and maintains the data linkage via the network.

There are basically two kinds of videoconferencing systems:

Dedicated systems have all required components packaged into a single piece of equipment, usually a console with a high quality remote controlled video camera. These cameras can be controlled at a distance to pan left and right, tilt up and down, and zoom. They became known as PTZ cameras. The console contains all electrical interfaces, the control computer, and the software or hardware-based codec. Omnidirectional microphones are connected to the console, as well as a TV monitor with loudspeakers and/or a video projector. There are several types of dedicated videoconferencing devices:
1. Large group videoconferencing are non-portable, large, more expensive devices used for large rooms and auditoriums.
2. Small group videoconferencing are non-portable or portable, smaller, less expensive devices used for small meeting rooms.
3. Individual videoconferencing are usually portable devices, meant for single users, have fixed cameras, microphones and loudspeakers integrated into the console.
Desktop systems are add-ons (hardware boards or software codec) to normal PCs and laptops, transforming them into videoconferencing devices. A range of different cameras and microphones can be used with the codec, which contains the necessary codec and transmission interfaces. Most of the desktops systems work with the H.323 standard. Videoconferences carried out via dispersed PCs are also known as e-meetings. These can also be nonstandard, Microsoft Lync, Skype for Business, Google Hangouts, or Yahoo Messenger or standards based, Cisco Jabber.
WebRTC Platforms are video conferencing solutions that are not resident by using a software application but is available through the standard web browser. Solutions such as Adobe Connect and Cisco WebEX can be accessed by going to a URL sent by the meeting organizer and various degrees of security can be attached to the virtual "room". Often the user will be required to download a piece of software, called an "Add In" to enable the browser to access the local camera, microphone and establish a connection to the meeting. WebRTC technology doesn't require any software or Add On installation, instead a WebRTC compliant internet browser itself acts as a client to facilitate 1-to-1 and 1-to-many videoconferencing calls. Several enhanced flavours of WebRTC technology are being provided by Third Party vendors.

Conferencing layers

The components within a Conferencing System can be divided up into several different layers: User Interface, Conference Control, Control or Signaling Plane, and Media Plane.

Videoconferencing User Interfaces (VUI) can be either graphical or voice responsive. Many in the industry have encountered both types of interfaces, and normally graphical interfaces are encountered on a computer. User interfaces for conferencing have a number of different uses; they can be used for scheduling, setup, and making a videocall. Through the user interface the administrator is able to control the other three layers of the system.

Conference Control performs resource allocation, management and routing. This layer along with the User Interface creates meetings (scheduled or unscheduled) or adds and removes participants from a conference.

Control (Signaling) Plane contains the stacks that signal different endpoints to create a call and/or a conference. Signals can be, but aren’t limited to, H.323 and Session Initiation Protocol (SIP) Protocols. These signals control incoming and outgoing connections as well as session parameters.

The Media Plane controls the audio and video mixing and streaming. This layer manages Real-Time Transport Protocols, User Datagram Packets (UDP) and Real-Time Transport Control Protocol (RTCP). The RTP and UDP normally carry information such the payload type which is the type of codec, frame rate, video size and many others. RTCP on the other hand acts as a quality control Protocol for detecting errors during streaming.

Multipoint videoconferencing

Simultaneous videoconferencing among three or more remote points is possible by means of a Multipoint Control Unit (MCU). This is a bridge that interconnects calls from several sources (in a similar way to the audio conference call). All parties call the MCU, or the MCU can also call the parties which are going to participate, in sequence. There are MCU bridges for IP and ISDN-based videoconferencing. There are MCUs which are pure software, and others which are a combination of hardware and software. An MCU is characterised according to the number of simultaneous calls it can handle, its ability to conduct transposing of data rates and protocols, and features such as Continuous Presence, in which multiple parties can be seen on-screen at once. MCUs can be stand-alone hardware devices, or they can be embedded into dedicated videoconferencing units.

The MCU consists of two logical components:

A single multipoint controller (MC), and
Multipoint Processors (MP), sometimes referred to as the mixer.

The MC controls the conferencing while it is active on the signaling plane, which is simply where the system manages conferencing creation, endpoint signaling and in-conferencing controls. This component negotiates parameters with every endpoint in the network and controls conferencing resources. While the MC controls resources and signaling negotiations, the MP operates on the media plane and receives media from each endpoint. The MP generates output streams from each endpoint and redirects the information to other endpoints in the conference.

Some systems are capable of multipoint conferencing with no MCU, stand-alone, embedded or otherwise. These use a standards-based H.323 technique known as "decentralized multipoint", where each station in a multipoint call exchanges video and audio directly with the other stations with no central "manager" or other bottleneck. The advantages of this technique are that the video and audio will generally be of higher quality because they don't have to be relayed through a central point. Also, users can make ad-hoc multipoint calls without any concern for the availability or control of an MCU. This added convenience and quality comes at the expense of some increased network bandwidth, because every station must transmit to every other station directly.

Videoconferencing modes

Videoconferencing systems use several common operating modes:

Voice-Activated Switch (VAS);
Continuous Presence.

In VAS mode, the MCU switches which endpoint can be seen by the other endpoints by the levels of one's voice. If there are four people in a conference, the only one that will be seen in the conference is the site which is talking; the location with the loudest voice will be seen by the other participants.

Continuous Presence mode, displays multiple participants at the same time. The MP in this mode takes the streams from the different endpoints and puts them all together into a single video image. In this mode, the MCU normally sends the same type of images to all participants. Typically these types of images are called "layouts" and can vary depending on the number of participants in a conference.

Echo cancellation

A fundamental feature of professional videoconferencing systems is Acoustic Echo Cancellation (AEC). Echo can be defined as the reflected source wave interference with new wave created by source. AEC is an algorithm which is able to detect when sounds or utterances reenter the audio input of the videoconferencing codec, which came from the audio output of the same system, after some time delay. If unchecked, this can lead to several problems including:

the remote party hearing their own voice coming back at them (usually significantly delayed)
strong reverberation, which makes the voice channel useless, and
howling created by feedback.

Echo cancellation is a processor-intensive task that usually works over a narrow range of sound delays.

Cloud-based video conferencing

Cloud-based video conferencing can be used without the hardware generally required by other video conferencing systems, and can be designed for use by SMEs, or larger international companies like Facebook. Cloud-based systems can handle either 2D or 3D video broadcasting. Cloud-based systems can also implement mobile calls, VOIP, and other forms of video calling. They can also come with a video recording function to archive past meetings.

Technical and other issues

Computer security experts have shown that poorly configured or inadequately supervised videoconferencing system can permit an easy 'virtual' entry by computer hackers and criminals into company premises and corporate boardrooms, via their own videoconferencing systems. Some observers argue that three outstanding issues have prevented videoconferencing from becoming a standard form of communication, despite the ubiquity of videoconferencing-capable systems. These issues are:

Eye contact: Eye contact plays a large role in conversational turn-taking, perceived attention and intent, and other aspects of group communication. While traditional telephone conversations give no eye contact cues, many videoconferencing systems are arguably worse in that they provide an incorrect impression that the remote interlocutor is avoiding eye contact. Some telepresence systems have cameras located in the screens that reduce the amount of parallax observed by the users. This issue is also being addressed through research that generates a synthetic image with eye contact using stereo reconstruction.
Telcordia Technologies, formerly Bell Communications Research, owns a patent for eye-to-eye videoconferencing using rear projection screens with the video camera behind it, evolved from a 1960s U.S. military system that provided videoconferencing services between the White House and various other government and military facilities. This technique eliminates the need for special cameras or image processing.
Appearance consciousness: A second psychological problem with videoconferencing is being on camera, with the video stream possibly even being recorded. The burden of presenting an acceptable on-screen appearance is not present in audio-only communication. Early studies by Alphonse Chapanis found that the addition of video actually impaired communication, possibly because of the consciousness of being on camera.
Signal latency: The information transport of digital signals in many steps need time. In a telecommunicated conversation, an increased latency (time lag) larger than about 150–300 ms becomes noticeable and is soon observed as unnatural and distracting. Therefore, next to a stable large bandwidth, a small total round-trip time is another major technical requirement for the communication channel for interactive videoconferencing.

The issue of eye-contact may be solved with advancing technology, and presumably the issue of appearance consciousness will fade as people become accustomed to videoconferencing.

Impact on the general public

High speed Internet connectivity has become more widely available at a reasonable cost and the cost of video capture and display technology has decreased. Consequently, personal videoconferencing systems based on a webcam, personal computer system, software compression and broadband Internet connectivity have become affordable to the general public. Also, the hardware used for this technology has continued to improve in quality, and prices have dropped dramatically. The availability of freeware (often as part of chat programs) has made software based videoconferencing accessible to many.

For over a century, futurists have envisioned a future where telephone conversations will take place as actual face-to-face encounters with video as well as audio. Sometimes it is simply not possible or practical to have face-to-face meetings with two or more people. Sometimes a telephone conversation or conference call is adequate. Other times, e-mail exchanges are adequate. However, videoconferencing adds another possible alternative, and can be considered when:

a live conversation is needed;

non-verbal (visual) information is an important component of the conversation;

the parties of the conversation can't physically come to the same location; or

the expense or time of travel is a consideration.

Deaf, hard-of-hearing and mute individuals have a particular interest in the development of affordable high-quality videoconferencing as a means of communicating with each other in sign language. Unlike Video Relay Service, which is intended to support communication between a caller using sign language and another party using spoken language, videoconferencing can be used directly between two deaf signers.

Mass adoption and use of videoconferencing is still relatively low, with the following often claimed as causes:

Complexity of systems. Most users are not technical and want a simple interface. In hardware systems an unplugged cord or a flat battery in a remote control is seen as failure, contributing to perceived unreliability which drives users back to traditional meetings. Successful systems are backed by support teams who can pro-actively support and provide fast assistance when required.

Perceived lack of interoperability: not all systems can readily interconnect, for example ISDN and IP systems require a gateway. Popular software solutions cannot easily connect to hardware systems. Some systems use different standards, features and qualities which can require additional configuration when connecting to dissimilar systems.

Bandwidth and quality of service: In some countries it is difficult or expensive to get a high quality connection that is fast enough for good-quality video conferencing. Technologies such as ADSL have limited upload speeds and cannot upload and download simultaneously at full speed. As Internet speeds increase higher quality and high definition video conferencing will become more readily available.

Expense of commercial systems: well-designed telepresence systems require specially designed rooms which can cost hundreds of thousands of dollars to fit out their rooms with codecs, integration equipment (such as Multipoint Control Units), high fidelity sound systems and furniture. Monthly charges may also be required for bridging services and high capacity broadband service.

Self-consciousness about being on camera: especially for new users or older generations who may prefer less fidelity in their communications.

Lack of direct eye contact, an issue being circumvented in some higher end systems.

These are some of the reasons many systems are often used for internal corporate use only, as they are less likely to result in lost sales. One alternative to companies lacking dedicated facilities is the rental of videoconferencing-equipped meeting rooms in cities around the world. Clients can book rooms and turn up for the meeting, with all technical aspects being prearranged and support being readily available if needed.

Impact on sign language communications

A video relay service (VRS), also known as a 'video interpreting service' (VIS), is a service that allows deaf, hard-of-hearing and speech-impaired (D-HOH-SI) individuals to communicate by videoconferencing (or similar technologies) with hearing people in real-time, via a sign language interpreter.

A similar video interpreting service called video remote interpreting (VRI) is conducted through a different organization often called a "Video Interpreting Service Provider" (VISP). VRS is a newer form of telecommunication service to the D-HOH-SI community, which had, in the United States, started earlier in 1974 using a non-video technology called telecommunications relay service (TRS).

One of the first demonstrations of the ability for telecommunications to help sign language users communicate with each other occurred when AT&T's videophone (trademarked as the "Picturephone") was introduced to the public at the 1964 New York World's Fair –two deaf users were able to communicate freely with each other between the fair and another city. Various universities and other organizations, including British Telecom's Martlesham facility, have also conducted extensive research on signing via videotelephony. The use of sign language via videotelephony was hampered for many years due to the difficulty of its use over slow analogue copper phone lines, coupled with the high cost of better quality ISDN (data) phone lines. Those factors largely disappeared with the introduction of more efficient video codecs and the advent of lower cost high-speed ISDN data and IP (Internet) services in the 1990s.

VRS services have become well developed nationally in Sweden since 1997 and also in the United States since the first decade of the 2000s. With the exception of Sweden, VRS has been provided in Europe for only a few years since the mid-2000s, and as of 2010 has not been made available in many European Union countries, with most European countries still lacking the legislation or the financing for large-scale VRS services, and to provide the necessary telecommunication equipment to deaf users. Germany and the Nordic countries are among the other leaders in Europe, while the United States is another world leader in the provisioning of VRS services.

References

Videoconferencing Wikipedia

(Text) CC BY-SA

Contents