MPEG-V compliant 3D simulation tool for multimedia playback with sensory effects

doi:10.21203/rs.3.rs-7046647/v1

MPEG-V compliant 3D simulation tool for multimedia playback with sensory effects

2025 · doi:10.21203/rs.3.rs-7046647/v1

preprint OA: closed

Full text JSON View at publisher

Full text 174,648 characters · extracted from preprint-html · click to expand

MPEG-V compliant 3D simulation tool for multimedia playback with sensory effects | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article MPEG-V compliant 3D simulation tool for multimedia playback with sensory effects FERNANDO BORONAT, Erika Villashagñay, Lluc Simó, Juan González This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7046647/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Traditional multimedia systems normally include audio-visual content that only stimulates the senses of sight and hearing. However, stimulating additional senses can provide more immersive and realistic experiences, increasing the users’ Quality of Experience (QoE). For years, the research community has been working on the development of multimedia systems that include sensory effect metadata associated with the audio-visual content and capable of generating these effects, thus stimulating all the users’ senses. Examples of effects are scents (smell), flavours (taste), vibrations, pressure, wind effects (touch), special lighting, temperature, humidity, smoke, sprays (environmental effects), etc. There already exist some related solutions and standards (e.g., MPEG-V) that enable the integration of real sensory effect generation devices into multimedia systems. However, once these integrations are designed, having a complete physical setup with multiple physical devices in different positions around the user to test their performance is costly and allows little flexibility. A faster and cheaper alternative method involves the use of simulators. In this article, an MPEG-V compliant web-based 3D simulator is presented. The user can select audio-visual content, visualise it and check the correct activation/deactivation of each sensory effect during playback, as well as the position from which they are generated, among the 45 positions around the user defined in the standard. Additionally, a communication module with a controller device has been included to be used when it is integrated in a real mulsemedia environment. Theoretical Computer Science Computer Architecture and Engineering Publishing/Media Media Studies Immersivity MPEG-V Sensory Effects Sensory Experience Mulsemedia Simulation Sensory Effect Simulator Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 1. Introduction Traditional multimedia systems typically include audio-visual (AV) content that only stimulates the senses of sight and hearing. However, the scientific community and industry have been working for many years on the development of multimedia systems that include sensory effects metadata (SEM) associated with AV content and that are capable of generating such effects, thus stimulating all the users’ senses. Examples of effects include scent (sense of smell), flavour (sense of taste), vibration, pressure, wind effects (sense of touch), special lighting, temperature, humidity, smoke, vaporisation (environmental effects), etc. The main aim of these systems is to stimulate other senses beyond sight and hearing and, therefore, to provide more realistic and immersive user experiences (‘ Seeing is believing, but feeling is the truth ’ [ 1 ]) during the consumption of multimedia content. In order to offer users new sensations by exploring other senses beyond the two traditional ones in interactive multimedia applications, a new concept known as MulSeMedia (Multiple Sensorial Media) has been proposed. In recent years, this concept has been gaining momentum in many areas (such as entertainment, education or virtual reality -VR-), where the inclusion of multisensory effects (hereafter referred to as mulsemedia content) is intended to provide users with increasingly realistic and immersive experiences [ 2 ]. In the past, such a practice has been shown to improve the users’ quality of experience (QoE) [ 3 ][ 4 ][ 5 ][ 6 ]. Currently, we are witnessing a very rapid development of new technologies that expand traditional AV media into multimodal applications. A growing number of companies are manufacturing devices that generate sensory effects, such as Sensory Acumen, Inc. 1 , Olorama Technology 2 (scents) and bHaptics 3 (vibration). In addition, some theme parks (e.g. Portaventura 4 in Spain) and next-generation cinemas ( 4D cinemas 5 ) are already incorporating sensory effects into their experiences. The term mulsemedia first appeared in 2014 in [ 1 ]. In that work, structures or systems that combine the playback of multimedia content with the generation of multiple sensory effects or sensory stimuli were described. That work marked a turning point in the study of immersive multimedia experiences, going beyond the visual and auditory fields and adding the stimulation of other senses, such as smell, touch or thermal perception. Since then, research in the mulsemedia area has experienced significant advances, including the emergence of related standards (e.g. MPEG-V [ 7 ]) and Internet of Things (IoT) architectures [ 8 ] that allow the integration of real devices that generate sensory effects in multimedia systems. However, there are still many technical challenges to be addressed in the mulsemedia research area, such as, for example, the effective control of effect generation devices or their integration in home environments, among many others. In [ 9 ], the areas where the inclusion of sensory effects can improve and enrich the users’ QoE in human-computer interaction are listed, such as telecommunications, education ([ 10 ] [ 11 ] [ 12 ]), e-commerce, advertising, entertainment (e.g., in 4D cinemas), tourism ([ 13 ]), health (e.g. therapy), social integration or immersive TV [ 14 ]. In [ 15 ] a comprehensive study on mulsemedia systems is presented, highlighting the little exploration of the mulsemedia field, the challenges (especially in the distribution of mulsemedia content and its generation) and its applicability in the mentioned areas. In [ 2 ], many of the most important developments in mulsemedia up to 2020 are compiled. One of the effects that has been most frequently experimented with is that of scent. The relevance of scent effects in evoking emotions, feelings, changes of attitude and memories, and the need for systems that can recreate them is explored in [ 16 ]. In [ 17 ] the impact on QoE when adding wind and pressure effects (by means of a haptic waistcoat with sensors) in laboratory prototypes is analysed. In [ 18 ], the masking effect that multisensory content brings to traditional content when the streams involved in the latter present biases is verified, so achieving the synchronisation of the user's perception of the presentation of sensory effects with the precise playback moments of the AV content being consumed at any given moment is an important challenge to guarantee immersivity and good QoE. In [ 19 ] and [ 20 ], a series of recommendations are proposed in order to guide the design and evaluation of mulsemedia content consumption systems (in particular, scent effect-based ones), in addition to describing some key challenges to successfully integrate such effects: the integration of mulsemedia content based on scent, synchronisation, standardisation, development of scent generation devices, intensity and duration of the effects, applicability in different areas (health, education, tourism...) and their remote distribution. Some interesting instructions are provided regarding the assessment of mulsemedia quality, including laboratory design, assessor preparation and experimental design. In [ 21 ], some background, history, and essential coding, decoding, and communication technologies that underpin this emerging field, with a focus on eXtended Reality (XR) and holographic communication applications, are presented. Possible future lines of research on mulsemedia communications in the context of 6G wireless systems are also discussed. On the other hand, regarding mulsemedia effect generation devices, the work in [ 22 ] presents a compilation of existing devices, as well as a guide for users to build their own mulsemedia environment, both in desktop and immersive scenarios (VR or 360 video). Although they have some limitations, all previous works show how the inclusion of sensory effects brings greater immersion, enjoyment and realism to the consumption of multimedia content. Despite the existence of the MPEG-V standard, the way in which sensory effects are signalled and integrated into multimedia streams throughout the distribution chain remains an unresolved issue. Current approaches only consider local multimedia content and evaluation scenarios with very short clips and pre-set temporal information for sensory effects. They do not sufficiently explore the possible combinations between many sensory effects (e.g. scents, wind effects, vibration/pressure...) and their synchronisation with AV content. However, in mulsemedia systems (especially VR systems), the number of sensory effects to be generated and synchronised can be high, which poses additional new research challenges. The impact of relevant factors such as intensity, persistence, degree of perception and delay, and possible masking effects is also not sufficiently analysed. Furthermore, incorporating multimedia content alongside omnidirectional AV content (e.g. 360 video) introduces the additional challenge of sensory effects being dependent on the viewing perspective (field of view or FoV). An example of the infrastructure required for a local mulsemedia environment can be seen in Fig. 1 . It consists of a multimedia content (usually AV) player device communicating, directly or indirectly (through a controller device), with real devices that generate different sensory effects around the user at the appropriate instants depending on the displayed content, i.e. synchronised with the playback of the AV content. However, setting up a complete scenario of this kind, with real physical effect generation devices placed in various positions around the user's position 6 , for the sole purpose of testing the correct functioning of the designed mulsemedia experience, and the corresponding generation of effects at the right instants, is costly and provides little flexibility. A much faster and more cost-effective method is to use simulators that allow checking the accuracy of the defined SEM metadata associated with the AV content. These metadata must follow a specific format and contain precise information about the sensory effects to be activated/deactivated from certain positions around the user and at specific moments during playback. Simulators typically include graphical elements (icons or animations) representing virtual effect generation devices (or actuators). This way, when an actuator for a sensory effect needs to be activated, its graphical representation is highlighted to indicate that the effect should be generated at that time and, therefore, perceived by the users of the mulsemedia system or application. In this paper, a web-based 3D simulation tool for the integration of multiple sensory effects in a multimedia consumption experience, compatible with the MPEG-V standard, is presented. It includes a player that accesses AV content stored in a content server (either in common video formats 7 or MPEG-DASH format -for adaptive streaming-, in the current version of the application) and a 3D simulation environment, which allows checking whether the activation/deactivation of sensory effects is triggered at the right instants (i.e. synchronised with the AV content playback) and from the correct positions (around the user), among the 45 that are contemplated in that standard. Furthermore, in order to be able to use the simulation tool also in a real mulsemedia environment in the future, a communications module has been included in it (using WebSocket technology) to be able to connect and send effect activation/deactivation commands, compatible with the MPEG-V standard, to a device called Mulsemedia Controller or MC (see Fig. 1 ), if it exists 8 , which would manage the real physical devices for generating the different sensory effects available in the local mulsemedia environment. The MC would therefore be an intermediate device or gateway between the player device (running the player and simulation tool) and the real physical sensory effect generation devices. In the envisaged system, communication between the two devices (player/simulator and MC) would be done using WebSocket technology, which allows smooth and uninterrupted communication between them, and the exchanged messages shall follow the MPEG-V standard. For communication between the MC device and the physical effect generation devices or actuators that generate the real sensory effects, other well-known protocols (e.g., the typical ones used in IoT architectures such as HTTP, MQTT, Bluetooth or Zigbee, among others) or proprietary protocols of the device manufacturers may be used (if provided). The main contributions of this paper can be summarised as follows: Modular web-based 3D mulsemedia simulation tool, including AV content player and MPEG-V compliant mulsemedia metadata rendering. Design and development of a 3D simulation environment. Emulation of the activation/deactivation of up to 8 different types of sensory effects in real time in a 3D environment around the user, at the 45 positions defined in the MPEG-V standard. Communications module integrated in the tool to be further used when it is integrated in real mulsemedia scenarios. It is based on the use of WebSocket technology and is used to send the proper MPEG-V-compatible messages to a MC device in charge of controlling the activation/deactivation of sensory effects available in the real scenario, according to the configured user’s preferences). The structure of the article is as follows: Section 2 presents a summary of the studies related to the proposal of the article; Section 3 provides a detailed description of the developed simulation tool, its architecture, design, involved processes and graphical user interface. The article concludes with some brief conclusions and the references used. 2. Related works To be able to create AV content consumption experiences that include sensory effects, it is necessary to previously generate the SEM metadata with information about those sensory effects and associate them with the AV content. This way, the AV content together with the SEM metadata can then be distributed and used in applications and players that can interpret them. Over the past decade, several tools have been developed that enable the creation of SEM metadata related to sensory effects. These metadata can be incorporated into AV content consumption experiences and usually align with the specifications outlined in Part 3 of the MPEG-V standard. In this section, first, what is included in this part of the standard is briefly presented; then, although there are more, two of the best-known tools created for this purpose, SEVino [ 4 ] [ 23 ] and STEVE [ 24 ] are presented; and, finally, other existing mulsemedia simulation tools are summarized, highlighting their main limitations. 2.1. Part 3 of the MPEG-V Standard The international MPEG-V: Media Context and Control standard [ 7 ] provides tools for describing real and virtual worlds. It specifies a format for exchanging data between the real world (the physical scenario or installation where the multimedia application runs and is perceived by the users) and the virtual world (the application itself). It focuses on the communication and representation of virtual world objects to real world objects and vice versa. Among many other things, it offers many tools for describing and representing sensory effects and devices. It consists of 7 parts: in Part 1, the architecture of MPEG-V is described; Part 2 presents the metadata called ‘ Control Information ’, which can be used to characterise sensory effect generation devices or actuators (e.g. scent generation devices, ambient lights, fans, etc.) and sensors (e.g. temperature, lighting, humidity, etc.). In addition, this part of the standard also defines metadata to describe user’s preferences with respect to generation devices and sensors; Part 3, called ‘ Sensory Information ’, presents tools to describe sensory effects; Part 4 specifies characteristics of virtual world objects (e.g. avatars); Part 5 defines metadata called ‘ Interaction Information ’, which can be used to transmit activation/deactivation commands (addressed to sensory effect generation devices) and current information from sensors; Part 6 specifies tools and types used in different parts of the standard (e.g. classification schemes for scents, light or locations); and finally, Part 7 presents reference software and conformance to the standard. This subsection focuses on Part 3 ( MPEG-V Part 3, ISO/IEC 23005-3 Sensory information [ 25 ]), which defines the description of sensory effects to be reproduced or generated by sensory effect generation devices. This part standardises the syntax and semantics of the effects by providing the basic sensory information structures to be able to generate Sensory Effect Metadata (SEM). These basic structures consist of building blocks and common attributes (e.g. effect type, duration, fading, intensity, etc.) that are used for all specified effects. The standard also defines a Sensory Effect Description Language (SEDL), based on XML ( Extensible Markup Language ), to describe such structures. The attributes allow specifying at which instant a specific effect is to be generated, with what intensity or for how long it is to be activated (i.e., its duration), among other things. The effects must be described using the so-called Sensory Effect Vocabulary (SEV). SEV allows specifying all sensory effects (e.g. light, scent) in detail, their general attributes (activation information, duration, priority and position) and, in addition, depending on the effect, their specific attributes (e.g. colour and specific scents for light and scent effects, respectively). According to the standard, SEDL and SEV must be used together to create SEM descriptions that can be understood and interpreted by a so-called Media Processing Engine (MPE), which can be either a decoder or a computer, capable of analysing these descriptions and controlling real effect generation devices that are compatible with the standard. To describe the commands of an effect generation device (also called an actuator ) and the detected information, the MPEG-V standard provides the Interaction Interface Description Language (IIDL). In addition, it defines the Control Information Description Language (CIDL), which is used to define commands for the activation/deactivation of effect generation devices or actuators according to user preferences. This language also enables the specification of the capabilities of effect generation devices and sensors. In the MPEG-V spatial model, sensory effects have a common attribute called location , which specifies the region from which a user of the mulsemedia application should perceive a sensory effect. The MPEG-V spatial model considers the user as the central point of reference, and the location of a sensory effect is defined according to the x, y, and z axes of the 3D space around the user. According to the standard, as shown in Fig. 2 , three planes (front, midway and back) are considered, with three height levels in each plane (bottom, middle and top) and five positions in each height level (left, centerleft, center, centerright and right). Therefore, according to the standard, a total of 45 positions can be distinguished. This way, the location of each of the 45 positions from which a sensory effect can be activated/deactivated is represented as a concatenation of three words. For example, the position “ right:bottom:front ” indicates that the effect will be activated/deactivated in the front plane, at the bottom (floor) and to the right of the 3D space around the user. The symbol “*” can be used to refer to a set of locations. For example, the position “*:top:right” applied to an effect indicates that the effect will be activated (or deactivated) on all devices that can generate that effect and that are located at the top right part of the 3D space. 2.2. Sensory Effect Metadata (SEM) generation There are several graphical tools that enable the generation of SEM metadata, such as SEVino [ 4 ] [ 23 ], STEVE [ 24 ], RoSE ( Representation of Sensory Effects ) Studio[ 27 ], SMURF ( Sensible Media aUthoRing Factory [ 28 ]), H-Studio [ 29 ], or Real 4D Studio [ 30 ]. However, most of them are obsolete and no longer available, or the authors have been unable to access and test them. This subsection describes two of those tools that are accessible and that have been tested by the authors for creating SEM metadata: Sevino2 and STEVE 2.0. Apart from STEVE 2.0, all of the above are based on the timeline-based temporal synchronisation paradigm and allow users to graphically define SEM metadata (in the case of H-Studio, only for sensory effects of movement and vibration) for AV content and save it to files. A more detailed review and comparison of the existing mulsemedia authoring tools is presented in [ 31 ], as well as several proposals for the representation of sensory effects and their characteristics. 2.2.1 The SEVino2 tool SEVino2 ( Sensory Effect Video Annotation tool )[ 4 ] [ 23 ], is written in Java and uses VLCJ 9 . Its graphical user interface (GUI) is shown in Fig. 3 . It allows the generation of sensory effect metadata from AV content in XML files, complying with the specifications of the MPEG-V standard. Sevino2 allows for the inclusion of 7 types of sensory effects (wind, vibration, lights, temperature, water spray or diffuser, aromas and mist) to be generated during the playback of associated AV content. After selecting an AV content, the activation/deactivation instants for those sensory effects during the playback of the AV content can be quickly and intuitively defined. It follows a timeline-based temporal synchronisation paradigm. For each effect, several parameters can be defined, such as duration, fading, priority, location, intensity value and range, start and end of the activation, among others, as shown in Fig. 3 . The tool includes a multimedia content player ( Sensory Effect Media Player or SEMP), and a simulator ( Sensory Effect Simulator or SESim, explained later) to check the correctness of the generated SEM metadata and that the defined effects are activated/deactivated at the desired instants and with the desired characteristics. Based on the defined effects by the user, SEVino2 generates an XML file (Fig. 4 ) with the MPEG-V-compliant SEM metadata description. This file can be modified later, either in the tool itself or manually by the user. In addition to allowing the export of sensory effect annotations to XML files, SEVino2 also allows the import of existing MPEG-V SEM description files so that users can easily check, modify or expand them using its GUI. SEVino has several important limitations or issues to take into account. On the one hand, its graphical interface does not allow the activation of the same type of sensory effect at the same time from different locations. In other words, it does not allow overlapping of the same type of sensory effect generated from different locations. On the other hand, all effects activated at the same time are grouped by SEVino2 in the SEM metadata (in the XML file) into effect groups (using the GroupOfEffects tag, Fig. 4 ) with a common activation timestamp in the tag itself. For instance, in an explosion scene, several effects can be activated at the same time (e.g., wind, light and vibration). The GroupOfEffects tag allows these effects to be efficiently combined as a single effect. This may require special processing to be done by applications that interpret these metadata to process the data for each effect individually rather than as a group. 2.2.2 The STEVE tool The first version of STEVE ( Spatio-Temporal View Editor ) was presented in [ 24 ]. It is an authoring tool that allows users with little to no knowledge of multimedia creation languages and models to create interactive multimedia presentations or applications for web and digital TV systems in a simple and user-friendly way. Its synchronisation model is based on a proprietary model called SIMM ( Simple Interactive Multimedia Model ) and, unlike SEVino, it follows an event-based temporal synchronisation paradigm. It allows authors to edit spatio-temporal views of hypermedia documents and create causal temporal relationships between their multimedia elements. The editor also supports the definition of user interactions and the simulation of these asynchronous events to preview the hypermedia presentation. Users can also define the properties of the multimedia presentation and verify them within the spatial visualisation interface. To do this, it includes its own player. Applications created in STEVE can be exported to HTML5 and NCL ( Nested Context Language ) documents. NCL is a standard XML-based declarative language for creating hypermedia documents for digital TV systems and is also an ITU standard for multimedia services and applications for IPTV systems. Version 4 of NCL (NCL 4.0) already integrates sensory effects as first-class entities. This version allows authors to define properties for effect elements and use descriptors to refer to them, while specifying the position of effects in spherical coordinates, making them independent of the physical installation of the application. In [ 32 ] and [ 33 ], a new event-based approach is proposed for creating mulsemedia metadata, using a new model called MultiSEM ( Multimedia Sensory Effect Model ) that facilitates graphical development for multimedia applications, allowing multiple sensory effects to be created, integrated and synchronised with multimedia content. Both papers present an extension of the tool (called STEVE 2.0) as a proof of concept for this model, which allows any user to easily and intuitively create mulsemedia presentations and/or applications including metadata about multimedia effects synchronised with the AV content. The tool provides, in a graphical form, causal temporal relationships based on MultiSEM relationships and provides authors with feedback on inconsistencies in temporal synchronisation. In addition, users can create interactivity relationships to activate, for example, sensory effects through user interactions with the multimedia application. The GUI of STEVE 2.0 is shown in Fig. 5 . It shows the content repository in the top left corner, the properties panel in the centre, the preview screen in the top right corner, and the time view in the bottom region. In addition, version 2.0 provides a list of the nine sensory effects supported in the view with the timeline, which are wind, water diffusion, vibration, cold, heat, aromas, lights, flashes, and fog. Furthermore, it allows effects to be grouped together, such as the rainstorm effect, which can consist of flashes, wind and water diffusion effects. Users can select one of these sensory effects, drag it to the timeline for temporal synchronisation with the other multimedia elements, and then define its representation characteristics (e.g., intensity, aroma type, light frequency) and physical positions. To graphically define temporal synchronisation, users can use the 12 temporal causal relationships supported by the tool (defined in [ 24 ]) that appear at the bottom of the interface, below the timeline. The tool contains a machine learning-based solution, called STEVEML [ 34 ], to automatically detect and extract possible sensory effects (from among those it supports) associated with video content, without user intervention. The user can then use the GUI to modify or adjust the synchronisation of these effects and their properties. 2.3. Mulsemedia effects simulators Simulators can be used to help mulsemedia system/application designers check that their designs work properly. Various mulsemedia simulators that have been developed in the past are briefly discussed in this Section. 2.3.1. Sensory Effect Simulator (SESim) The SEVino2 tool includes a simulation tool called Sensory Effect Simulator (SESim) to check the correct activation/deactivation of sensory effects in synchronization with the associated AV content. Like SEVino2, SESim is also developed in Java (using the VLCJ framework) and has a modular architecture. To perform the simulation, SESim needs to receive the AV content and an additional XML file with the associated SEM metadata (which may have been created with SEVino) as inputs. The SESim XML parser module then extracts the sensory effect data from the SEM file and sends it to the simulator module . This module sends the AV content to the player module and the extracted effects to the timer module . The timer module also receives the current playback time to activate/deactivate the corresponding virtual actuator. Figure 6 shows the GUI of this tool. The left image shows the interface when the application is launched, displaying the seven types of effects that can be simulated: wind, lights, fog, temperature, vibration, water diffusion, and scents. Around the video player there are a series of boxes representing the effect generation devices and some additional features, such as their position. The right image shows the simulator in operation at a certain instant during video playback. The activated sensory effects and some additional data, such as the colour of the lights or the intensity of the sound or fans, are highlighted in red. Additionally, there is a text panel that displays messages ( logs ) with information on the effects that are being activated/deactivated during the playback. In addition to analysing and interpreting the SEM metadata in the XML file, SESim supports automatic average colour extraction. If the SEM metadata, generated with Sevino2, indicates that automatic colour extraction is permitted, SESim extracts a video frame every 0.1 seconds and divides it into several parts. The average colour is calculated for each part and rendered in the corresponding light around the video player. SESim has the following limitations: The simulation cannot be stopped at a given instant or jumped to a specific point in the AV content to check which effects are activated at that time. All effects representations are positioned around the video player (Fig. 6 ), so it is not shown from which of the 45 positions considered by the MPEG-V standard each effect is being generated. The number of effect types is limited to 7. 2.3.2 Simulator integrated into STEVE The STEVE tool allows authors to check the temporal and spatial behaviour of multimedia applications by providing a synchronised graphical view of time and space. The icons of the effects that are activated at any given time appear on the AV player (Fig. 5 shows the heat effect icon in white colour overlapping the video shown in the upper right corner while a cup of coffee or tea is being served in the video sequence). This simulator has the following limitations: It neither indicates which of the 45 positions specified in the standard each effect is generated from. No extra information about the effects is provided (e.g., intensity, fade in/out, etc.). The number of effect types displayed in the simulation is also limited (up to 9). 2.3.3 Sensible Media Simulator Another simulator also compatible with the MPEG-V standard is the Sensible Media Simulator [ 35 ], whose GUI is shown in Fig. 7 . It was developed using Flex and is designed to simulate sensory effects in a car and test a system designed for that environment, taking advantage of existing devices inside the vehicle to generate sensory effects. It assumes that a car is equipped with a wind system with cooling, heating and ventilation functions, vibrating seats with massage function and heating cables, and a LED lighting system with colour and intensity control. The system takes the temperature value provided by the temperature sensor inside the vehicle to be taken as input data at any given time. This enables adaptive control of the temperature of the wind effects to be generated according to that temperature. The car's entertainment system provides a GUI displaying an AV content player and the available devices that can be used as sensory effect generation devices or actuators. Users only need to load AV content and SEM metadata. It is based on the use of the MPEG-V standard CIDL language to describe the capabilities of the effect generation devices or actuators, as well as the user's sensory preferences. The IIDL language is also used to describe the information detected by the sensors, as well as the device commands. This simulator has the main drawback of being designed and being useful for a very specific use case (multimedia system inside a car). 2.3.4 3D sensory effects simulator based on Maxon Cinema 4D In [ 36 ], a 3D simulator of sensory effects is presented, created with Maxon Cinema 4D 10 , whose GUI is shown in Fig. 8 . It is based on an event-based temporal synchronisation paradigm, receives files with SEM metadata and simulates the activation/deactivation of the sensory effects during the multimedia presentation. Using the simulator, a user can add, remove and reposition effect generation devices or actuators in a 3D room. The actuators are distributed on the walls and ceiling of the room and are represented by black circles with black cone-shaped lines (the cone indicates the direction of the actuator). In the simulator, the direction is fixed and always towards the centre of the 3D room. The colour of an actuator lights up to indicate that it is active at a given moment. Each colour represents a different type of effect actuator: white for light effects, red for heat effects and green for cold wind effects. The intensity of the colour representation is associated with the intensity of the effect being played. In addition to being compatible with the MPEG-V standard positioning system, the simulator implements an extension of the standard to allow authors to specify the location of sensory effects more precisely. To do this, it makes use of a spherical coordinate system for positioning. This simulator has the following limitations: It only supports 3 types of sensory effects (light, heat and cold wind). It displays very limited information about the effects: the intensity of a sensory effect through 3 shades of the colour assigned to its 3D actuator. It requires the use of a professional software package (Maxon Cinema 4D). 2.3.5 Real4DAStudio 3D sensory effects simulator [ 30 ] In [ 30 ], a proprietary mulsemedia content authoring tool called Real4DAStudio is presented, which allows the creation of MPEG-V-based SEM metadata and also enables 3D mulsemedia simulation with up to nine different types of effects (light, flashes, temperature, wind, vibration, water diffusion, aromas, fog and rigid body motion). Content-based interfaces and an event-based paradigm are used to synchronise the effects. To the authors' knowledge, the tool is commercially available 11 and is not openly distributed, which is a significant limitation. 3. Developed mulsemedia simulation TOOL This section presents the web-based tool developed for AV content playback and simulation of the activation/deactivation of multisensory effects, as specified in the metadata associated with that AV content. This tool will usually run on a user's main AV content consumption device. It has been developed using HTML5 technology (HTML, CSS, and JavaScript), as well as libraries for 3D web development. CSS has been used to structure and style the interface, ensuring an intuitive user experience. JavaScript handles the application logic, facilitating user interaction and controlling the playback of the AV content. It also captures user interaction events and, if a local environment with real effect generation devices exists, communicates with the Mulsemedia Controller device via a WebSocket-based communication channel. To work properly, the tool needs that the MPEG-V compatible SEM metadata, corresponding to the sensory effects related to the AV content to be played in it, have been previously generated. So, in this section, first, an explanation of the methodology for generating the SEM metadata is provided, followed by a more detailed description of the developed tool. 3.1. Mulsemedia content (SEM metadata) generation As explained in section 2, there are multiple solutions that can be used to create XML-based SEM metadata. In this case, the utilisation of the SEVino2 tool is proposed, which employs the SEDL language and facilitates the creation of an XML file containing the specified metadata in accordance with the specifications outlined in part 3 of the MPEG-V standard [ 26 ]. A notable benefit of using this software is that the files generated with SEM metadata created using it can be easily modified by the users with a simple text editor, allowing them to add their own effects or modify the existing ones. However, as previously mentioned, SEVino2 has several issues that affect the development of mulsemedia tools: it does not allow the activation of the same type of sensory effect at the same time from different locations; and it groups all the effects that are activated at the same time in the SEM metadata (in the XML file) within the GroupOfEffects tag with a single common activation timestamp in the tag itself. The first one can be overcome by editing the generated XML file and adding effects manually, copying the metadata for the effect created with SEVino2 and then pasting it, changing the location to create another identical effect but generated from another position. The second one implies the need to process the metadata to obtain the activation/deactivation times for each individual effect. The presented tool in this paper could be adapted to support any type of effect included in the SEM metadata XML file, even if it is not defined in the MPEG-V standard or is not provided by Sevino2. The AV content files (in common video or MPEG-DASH formats) that are to be displayed in the tool, together with their associated XML SEM metadata files, must be stored on a multimedia server accessible via web. 3.2. Modular architecture of the mulsemedia simulation tool Figure 10 shows the modular architecture of the developed tool. The user selects AV content, which will have its associated XML file with SEM metadata and can play it either in full screen or in the 3D simulator. The different modules of the architecture are explained below. 3.2.1 AV Media Player A video player using the HTML video element and the JavaScript DashJS library has been used in the tool. Depending on whether the AV content (chosen by the user from the drop-down list available at the top left of the GUI 12 ) is in MPEG-DASH format, the DashJS MediaPlayerClass instance will be attached or detached from the HTML video element. In addition, this section includes controls to start and stop playback, move forward or backward in the video, display the playback time and total duration of the video, control the audio volume, and view the content in full screen, as shown below. 3.2.2 XML parser This module receives a file containing the SEM metadata and extracts the information of all the characteristics of the effects that can be activated when the user plays the selected AV content. All the important information about each effect is stored in internal data structures, such as its type, activation/deactivation times, activation intensity, and whether it has gradual activation or deactivation ( fade-in or fade-out ). 3.2.3 Sensory effects manager The internal data structures generated by the XML parser module are passed to the Sensory effects manager , which also receives information about the playback point of the AV content player. This module is responsible for checking whether, at the current playback point, it is necessary to graphically represent the activation of any effects in the 3D scene at a specific location. So, it will pass the precise indications to the 3D simulation module. Furthermore, if there exists an MC device, it also transfers this information to the communications module so that, if applicable, the corresponding message is sent to that device via a previously established WebSocket channel. To do this, it creates MPEG-V-compatible messages to activate/deactivate the sensory effects with the appropriate parameters, always considering the user's preferences, and passes them to the communications module. 3.2.4 Communications module As mentioned above, the tool includes a communications module to establish a connection (using WebSocket technology, through an intermediate server) with the MC device (see Fig. 1 ), if it exists, and to send it the corresponding MPEG-V standard-compliant messages to activate/deactivate the sensory effects properly. The communications module has a mechanism for controlling connections to the server that manages automatic reconnections in the event of sporadic network interruptions (usually a WiFi network) or the WebSocket channel. 3.2.5 User preferences module The tool can store users’ preferences so that the settings of the effect generation devices can be adjusted to improve the quality of their multimedia consumption experience (i.e., QoE) through personalised services and increasing the level of enjoyment and satisfaction. The tool allows users to choose which effects they want to activate/deactivate from those available, and to configure them during the simulation, discarding those they do not want. They can also configure the characteristics of the effects (e.g. maximum or minimum intensities, etc.). If the MC device is present, only the commands corresponding to the activation/deactivation of the accepted effects, considering the user's preferences, are passed to the communications module to be sent to that device via WebSocket. 3.2.6 3D simulation module The 3D simulation module, which is explained in more detail in the following section, is responsible for representing in a 3D space that simulates a rectangular room the different effects that are activated/deactivated during the playback of AV content. The 3D space contains a virtual screen inside where the AV content is rendered and 3D elements that represent the activation/deactivation of effects in each of the 45 positions considered in the MPEG-V standard (see section 2.1), around the user, who is assumed to be in the centre of the room. 3.3. Web-based 3D simulation environment To simulate the activation/deactivation of (one or more) sensory effects in each of the 45 positions around the user covered by the MPEG-V standard, a 3D web-based environment has been created using Blender 13 and three.js 14 . On the one hand, as mentioned above, this 3D environment simulates a 3D room with a virtual 2D screen or display on the back wall, on which the video content will be displayed (Fig. 11 ). The user is assumed to be in the centre of the room, facing the virtual 2D display. On the other hand, to visually simulate the activation/deactivation of the effects in each of the 45 defined positions, 45 transparent spheres have been placed in those positions in the 3D environment (Fig. 11 ). Each sphere will change its colour as sensory effects are activated in or from its position. If several effects are activated in the same position at the same time, the first effect to be activated will define the colour of the sphere in that position. For the rest of the concurrent effects, rings will be drawn around that sphere in the colour corresponding to each of them, as shown in Fig. 11 . As can be seen, the virtual 2D display with the video viewer is situated on the front wall of the room, and the spheres have the potential to obstruct the viewer's perspective of the video. This is not problematic, as the user is able to interact with the 3D scene using the mouse (e.g. zoom in, move or rotate) to obtain a better view of the simulation of the activation/deactivation of the different effects (Fig. 12 ). For user’s interaction with the 3D environment, a three.js tool called Raycaster 15 has been used, which detects which object the mouse is on at any given moment within the 3D space. 3.4. Developed mulsemedia simulation tool 3.4.1 Graphical user interface As illustrated in Fig. 13 , the GUI of the developed web-based tool is composed of several key components. At the top of the screen, a drop-down menu is available to select an AV content from those available on the web server, along with the playback controls (play/pause button, volume control, and progress bar). The current version of the tool simulates the activation/deactivation of eight sensory effects: fog, light, aromas, mist, temperature, wind, vibration and flashes (to be extended in further work). At the top, there is an information panel showing the types of effects and the colour assigned to each one (Table 1 ). Table 1 Colours assigned to the 8 sensory effects supported by the simulation tool Type of effect Colour TemperatureType (Heating/Cooling) Red ScentType Green LightType Blue WindType Purple SprayingType Aqua FogType Olive VibrationType Black FlashType Pink The user can also select which effects they want to include in the simulation (and, if applicable, in the real mulsemedia system) and their properties. After clicking on the button labelled ‘ Effects’ , the user’s preferences window will appear (Fig. 14 ). By default, the activation/deactivation of all the effects included in the downloaded SEM metadata (XML file) are included in the simulation, but the users can configure only those that interest them from the user preferences panel and exclude the others (maybe because they bother or annoy the user). This way, the tool will adjust the settings of the simulation (and, in case a real scenario is available, of the generation devices) based on these preferences. For example, in a real mulsemedia scenario, a pregnant woman may not tolerate certain smells or even does not want to experience sudden vibrations or abrupt movements, so she can exclude unpleasant aromas and vibration effects from the mulsemedia experience. The central part of the tool shows the simulated 3D room with the transparent spheres placed in the 45 positions specified in the MPEG-V standard. Initially, the 3D scene is shown from the back of the room so that all the positions (spheres) represented can be seen. However, as already mentioned, the user can interact with it (rotating, moving or zooming it in or out) as shown in Fig. 12 . 3.4.2 Tool processes The tool involves the following processes 16 : a) AV content selection Initially, from the AV content selector (drop-down list), the user must select an item. That list contains all the AV content items available on the server (in the current version, it includes video files with most common formats -e.g., MP4- and MPEG-DASH index files with the ‘. mpd’ extension). Upon selection, the XML file with the SEM metadata associated to the selected AV content is also automatically downloaded. The XML parser module processes it, extracting the properties of each of the included effects. They include the type of effect, the position in the 3D space, the start and end times of the effect generation, among some others. b) Playback of video content The user can start (and stop) the playback of the AV video content using the button labelled ‘ Play/Pause’ , as well as skip forward or backward during playback (by clicking on the progress bar), without affecting the activation/deactivation of sensory effects at the precise instants as defined in the SEM metadata. By default, the web-based tool displays the 3D simulation environment described above, in which the user can view the video playback on the virtual 2D screen included in it and see the 3D simulation of how the effects are dynamically activated/deactivated at each position around the user as playback progresses. To find out which effects are active at a precise instant and in a specific position (i.e., in a coloured sphere), the user can click on the corresponding sphere (i.e., the one in that position) and a pop-up window will appear on the right-hand side of the page showing the information of all the effects active in that position (Fig. 15 ). c) Full-screen playback The tool also allows the users to view video content in full-screen mode (by selecting the ‘ Full screen ’ checkbox at the top left, near the AV content selector, see Fig. 13 ), without displaying the 3D effects simulation, for a better viewing experience. This option has been provided so that, in case the tool is used in a real mulsemedia scenario (with an MC device and real sensory effect generation devices), the users can view the AV content using the entire screen and enjoy a more realistic and complete mulsemedia experience. d) Activation/deactivation of real effects In case the user has a real mulsemedia environment and the Mulsemedia Controller device, the communications module will be used to send (via WebSocket protocol) the relevant commands compatible with the MPEG-V standard for activating/deactivating effects. To facilitate this process, the tool includes several elements at the top of the page: a checkbox labelled ‘ Mulsemedia Controller’ ; two boxes for introducing the IP address or name of the WebSocket server and the port on which that service is active; and a button labelled ‘Connect’ to initiate the connection. Unless the checkbox is activated, the other elements remain blocked. The colour of the button labelled ‘Connect’ will change to green or dark grey to indicate connection or disconnection to the server, respectively, and adjacent text messages will appear indicating the status (Fig. 16 ). The tool, therefore, in this case, apart from simulating the sensory effects activation/deactivation in the 3D environment, at the moment of showing the activation/deactivation (or even before, depending on the properties of the sensory effect generation devices 17 ), will also send the corresponding MPEG-V compliant messages to the WebSocket server so that it, in turn, forwards them to the Mulsemedia Controller device. Figure 17 shows an example of a message sent to activate a wind effect ( Windtype ) on the generation device (e.g., a fan) with the identifier ‘ wind001’ and an intensity of ‘ 30 ’ (%) of its maximum intensity, at the absolute time ( absTime ) ‘ 1:30:23 ’. The communication protocol between the simulation tool and the MC device, via WebSocket technology, is beyond the scope of this paper, and its complete implementation is left for further work. 4. Conclusions In this article, a web-based 3D mulsemedia simulation tool, compatible with the specifications in the MPEG-V standard, has been presented. It enables creators of mulsemedia systems and applications based on that standard to save time and money when checking their correct performance, without the need for real physical sensory effects generation equipment. The tool makes use of AV content and its corresponding SEM metadata (which can be generated by using several MPEG-V-compliant mulsemedia authoring tools). It allows checking visually and in a virtual 3D environment the correct performance of the activation/deactivation of the effects included in the SEM metadata (i.e., whether they are synchronised with the AV content playback) and from any of the 45 positions around the user specified by the standard. The tool also allows the user to customise which effects are to be incorporated within the simulation (by default, all those included in the metadata are simulated) and their properties. In view of further integration of the tool in real physical mulsemedia scenarios, a communications module based on WebSocket technology has been included in it to facilitate the communication with a WebSocket server. Through this server, the MPEG-V compliant activation/deactivation messages will be forwarded to a multimedia controller device. On the one hand, the development of that device and the communications protocol with it via WebSocket is left for further work. On the other hand, the positioning model employed in MPEG-V imposes constraints on the number of potential locations for sensory effect generation around the user. In future versions of the tool, the simulated 3D environment will be modified to incorporate an additional positioning model that also takes spherical coordinates into consideration, and more types of effects will be included. The former will enable the generation of effects to be verified from more specific locations, in addition to the pre-established locations considered in the MPEG-V standard. Declarations Funding: The article includes funding (included in the ACK section of the paper). The publication was supported, in part, by the following grants with refs.: PID2021-126645OB-I00, funded by MICIU/AEI/10.13039/501100011033/ and by “ERDF A way of making Europe”; CIAICO/2022/025, funded by the Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital of Generalitat Valenciana (DOGV 8919/05.10.2020); and ACIF/2021/192 for pre-doctoral researchers, funded by Generalitat Valenciana under the ‘Programa I+D+i ’ and the European Social Fund (ESF). Author Contribution Statement All authors (F.B., E.V., LL.S. and J.G.) contributed to the development and testing of the simulation tool. E.V. and J.G. designed the 3D simulation scenario. E.V., F.B. and LL.S. wrote the main part of the code of the tool. J.G. checked it and corrected some detected mistakes. Additionally, F.B. was the supervisor of the work and the writer of the first draft of the manuscript. All the authors (F.B., E.V., LL.S. and J.G.) commented on previous versions of the manuscript, and, finally, read and approved the final manuscript. Acknowledgements This publication was supported, in part, by the following grants with refs.: PID2021-126645OB-I00, funded by Ministerio de Ciencia, Innovación y Universidades (MICIU), Agencia Estatal de Investigación (AVI) ERDFMICIU/AEI/10.13039/501100011033/ and by “ERDF A way of making Europe”; CIAICO/2022/025, funded by the Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital of Generalitat Valenciana (DOGV 8919/05.10.2020); and ACIF/2021/192 for pre-doctoral researchers, funded by Generalitat Valenciana under the ‘Programa I+D+i ’ and the European Social Fund (ESF). References Ghinea G, Timmerer C, Lin W, Gulliver SR (2014) ‘Mulsemedia: State of the Art, Perspectives, and Challenges.’, ACM Transactions on Multimedia Computing, Communications, and Applications , vol. 11, no. 1s, pp. 1–23, Oct. 10.1145/2617994 Velasco C, Obrist M (2020) ‘Multisensory Experiences: Where the senses meet techonology’, Oxford , p. 112. 10.1093/oso/9780198849629.001.0001 Waltl M, Timmerer C, Hellwagner H (2010) ‘Improving the quality of multimedia experience through sensory effects’, in 2010 2nd International Workshop on Quality of Multimedia Experience, QoMEX 2010 - Proceedings , IEEE, Jun. pp. 124–129. 10.1109/QOMEX.2010.5517704 Waltl M, Rainer B, Timmerer C, Hellwagner H (2012) ‘A toolset for the authoring, simulation, and rendering of sensory experiences’, in MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia , pp. 1469–1472. 10.1145/2393347.2396522 Yuan Z, Ghinea G, Muntean GM (Jan. 2015) Beyond multimedia adaptation: Quality of experience-aware multi-sensorial media delivery. IEEE Trans Multimedia 17(1):104–117. 10.1109/TMM.2014.2371240 Rainer B, Waltl M, Cheng E, Shujau M, Timmerer C, Davis S (2012) ‘Investigating the impact of sensory effects on the Quality of Experience and emotional response in web videos’, in 2012 Fourth International Workshop on Quality of Multimedia Experience , Melbourne, VIC (Australia), pp. 278–283 ‘ISO/IEC 23005 (2011) Information technology - Media context and control (MPEG-V)’, 2011 Jalal L, Anedda M, Popescu V, Murroni M ‘Internet of Things for Enabling Multi Sensorial TV in Smart Home’, in (2018) IEEE Broadcast Symposium, BTS 2018 , IEEE, Oct. 2018, pp. 1–5. 10.1109/BTS.2018.8550959 Sulema Y (2016) ‘Mulsemedia vs. Multimedia: State of the art and future trends’, in International Conference on Systems, Signals, and Image Processing , IEEE, May pp. 1–5. 10.1109/IWSSIP.2016.7502696 Tal I et al (2020) ‘Mulsemedia in education: A case study on learner experience, motivation, and knowledge gain’, in CSEDU Conference , pp. 180–187 Mohana M, Valliammal N, Suvetha V, Krishnaveni M, Subashini P, Ghinea G ‘A Study on Technology-Enhanced Mulsemedia Learning for Enhancing Learner’s Experience in E-Learning’, (2023) International Conference on Network, Multimedia and Information Technology, NMITCON 2023 , 2023. 10.1109/NMITCON58196.2023.10275964 Muntean CH, Tal I, Bogusevschi D, Bratu M, Bi T, Muntean GM (2024) ‘Mulseplayer: A Multi-Sensorial Media Content Delivery Solution to Enhance End-User Quality of Experience’, IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, BMSB , 10.1109/BMSB62888.2024.10608351 Melo M et al (2022) Immersive multisensory virtual reality technologies for virtual tourism: A study of the user’s sense of presence, satisfaction, emotions, and attitudes. Multimed Syst. 10.1007/s00530-022-00898-7 Marfil D, Boronat F, Gonzalez J, Sapena A (2022) Integration of Multisensorial Effects in Synchronised Immersive Hybrid TV Scenarios. IEEE Access 10:79071–79089. 10.1109/ACCESS.2022.3194170 Covaci A, Zou L, Tal I, Muntean GM, Ghinea G (2018) ‘Is multimedia multisensorial? - A review of mulsemedia systems’, Aug. 01, Association for Computing Machinery . 10.1145/3233774 Obrist M, Tuch AN, Hornbaek K (2014) ‘Opportunities for odor: experiences with smell and implications for technology’, in CHI ’14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , May Yuan Z, Bi T, Muntean GM, Ghinea G (2015) ‘Perceived synchronization of mulsemedia services’, IEEE Trans Multimedia , vol. 17, no. 7, pp. 957–966, Jul. 10.1109/TMM.2015.2431915 Ademoye OA, Murray N, Muntean G-M, Ghinea G (Aug. 2016) Audio Masking Effect on Inter-Component Skews in Olfaction-Enhanced Multimedia Presentations. ACM Trans Multimedia Comput Commun Appl 12(4):1–14. 10.1145/2957753 Murray N, Ademoye OA, Ghinea G, Muntean G-M (2017) ‘A Tutorial for Olfaction-Based Multisensorial Media Application Design and Evaluation’, ACM Comput Surv , vol. 50, no. 5, pp. 1–30, Sep. 10.1145/3108243 Murray N, Muntean GM, Qiao Y, Lee B (2018) Olfaction-enhanced multimedia synchronization. MediaSync: Handbook on Multimedia Synchronization. Springer International Publishing, Cham, pp 319–356. doi: 10.1007/978-3-319-65840-7_12 . Akyildiz IF, Guo H, Dai R, Gerstacker W (2023) ‘Mulsemedia Communication Research Challenges for Metaverse in 6G Wireless Systems’, ITU Journal on Future and Evolving Technologies , vol. 4, no. 4, Dec Saleme EB, Covaci A, Mesfin G, Santos CAS, Ghinea G (2019) ‘Mulsemedia DIY: A survey of devices and a tutorial for building your own mulsemedia environment’, Jun. 01, Association for Computing Machinery . 10.1145/3319853 Waltl M, Rainer B, Hellwagner H (Feb. 2013) An end-to-end tool chain for Sensory Experience based on MPEG-V. Signal Process Image Commun 28(2):136–150. 10.1016/J.IMAGE.2012.10.009 de Mattos DP, Muchaluat-Saade DC (2018) ‘STEVE: a Hypermedia Authoring Tool based on the Simple Interactive Multimedia Model’, in DocEng ’18: Proceedings of the ACM Symposium on Document Engineering 2018 , Halifax NS Canada, Aug. pp. 1–10 ‘International standard ISO/IEC 23005-3 (2013) Information technology — Media context and control — Part 3: Sensory information’ ‘International standard ISO/IEC 23005-3 (2013) Information technology — Media context and control — Part 3: Sensory information (MPEG-V Part 3)’ Choi B, Lee E-S, Yoon K ‘Streaming Media with Sensory Effect’, in (2011) International Conference on Information Science and Applications , Jeju, Korea (South), 2011, pp. 1–6 Sang-Kyun, Kim (2013) Authoring multisensorial content. Signal Process Image Commun 28(2):162–167 Danieau F, Bernon J, Fleureau J, Guillotel P, Mollet N, Christie M (2013) ‘H-Studio: An Authoring Tool for Adding Haptic and Motion Effects to Audiovisual Content’, in Proceedings of the adjunct publication of the 26th annual ACM symposium on User interface software and technology , pp. 83–84 Shin SH, Ha KS, Yun HO, Nam YS (2016) ‘Realistic media authoring tool based on MPEG-V international standard’, in International Conference on Ubiquitous and Future Networks, ICUFN , IEEE, Jul. pp. 730–732. 10.1109/ICUFN.2016.7537133 De Mattos DP, Muchaluat-Saade DC, Ghinea G (2021) ‘Beyond Multimedia Authoring: On the Need for Mulsemedia Authoring Tools’, ACM Comput Surv , pp. 1–31 De Mattos DP, Muchaluat-Saade DC, Ghinea G ‘An Approach for Authoring Mulsemedia Documents Based on Events’, (2020) International Conference on Computing, Networking and Communications, ICNC 2020 , pp. 273–277, Feb. 2020. 10.1109/ICNC47757.2020.9049485 Vieira R, Ivanov M, Abreu R, dos Santos JAF, Mattos D (2023) and D. C. Muchaluat-Saade, ‘Autoria de Aplicações Multissensoriais para TV 3.0 com a Ferramenta STEVE’, pp. 143–149, Oct. 10.5753/WEBMEDIA_ESTENDIDO.2023.236124 de Abreu RS, Mattos D, Santos Jd, Ghinea G (2021) Muchaluat-Saade, ‘Toward content-driven intelligent authoring of mulsemedia applications’. IEEE Multimedia 28(1):7–16 Kim S-K, Joo Y-S, Lee Y (2013) Sensible Media Simulation in an Automobile Application and Human Responses to Sensory Effects. ETRI J 35(6):1001–1010 Josué M et al ‘Modeling sensory effects as first-class entities in multimedia applications’, Proceedings of the 9th ACM Multimedia Systems Conference, MMSys (2018), pp. 225–236, 2018, doi:, pp. 225–236, 2018. 10.1145/3204949.3204967 Footnotes https://sensoryacumen.com/ (last access: June 2025) https://www.olorama.com/ (last access: June 2025) https://www.bhaptics.com/ (last access: June 2025) https://www.portaventuraworld.com (last access: June 2025) https://www.oceanografic.org/actividad/cine-4d/ or https://www.heroncity.com/valencia/heron-city-paterna/4dx (last access: June 2025) The MPEG-V standard defines up to 45 positions around the user The HTML tag is used in the simulator, therefore only the video formats supported by the used browser could be played (e.g., MP4, WebM or OGG). The development of the MC device and the communication protocol with it, via WebSocket technology, is beyond the scope of this paper and is left for further work. Java Framework for the VLC Media Player. https://github.com/caprica/vlcj (last access: June 2025) Maxon Cinema 4D: https://www.maxon.net/cinema-4d (last access: June 2025). Real4DStudio http://www.real4dhub.or.kr (last access: June 2025) When selected with the mouse, the drop-down list of the tool will automatically list all contents available in a specific folder on the media server, provided that both the AV content and its associated SEM metadata files (both with the same name but different extension) are available. https://www.blender.org/ (last access: June 2025) https://threejs.org/ (last access: June 2025) https://threejs.org/docs/#api/en/core/Raycaster (last access: June 2025) Note for reviewers : in https://youtu.be/TVm9APxoE3o a preliminary draft of a video about the tool can be watched (apologies because it is in Spanish). If the paper is finally accepted, a better one will be prepared, and its link will be provided here. If accepted, the source files of the tool will be made publicly available. Some effect generation devices take time to start the generation of the desired effect, or the effect takes time to be noticed by the users. Therefore, activation messages should be sent in advance (i.e., a few seconds before the user is supposed to notice the effect). As an example, consider a fan device that generates a wind effect, in which the speed of the blades starts from zero (wind intensity of 0) when it is activated and takes a while to reach the desired wind intensity included in the received activation message. Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7046647","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":480716173,"identity":"2e479010-7d88-40b8-b2a0-9225af4c8883","order_by":0,"name":"FERNANDO BORONAT","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAz0lEQVRIiWNgGAWjYBAC9hlQBj9DApFaeG5AaAnJBpK1GBwgWot087EPH3fY1RkfT3746AbD4TyDA8yHP+DVInMseebMM8kSZmeeGRvnMBwuNjjAliaBT4u9RI4xM28bs4TZjRw2aaCWxJkNPGb4HQbRUi9hPAOuhf8zfodBtByWMJCAauln4GHA6zCQXxhnth2XnAH2i0F6MT8zmxl+LdLNhxk+tlXz87cnP3ycU2Gdx8be/Bivw9CAATANMJOgHgwSSNUwCkbBKBgFwx8AALw2QVnipZCcAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0001-5525-3441","institution":"Universitat Politècnica de València","correspondingAuthor":true,"prefix":"","firstName":"FERNANDO","middleName":"","lastName":"BORONAT","suffix":""},{"id":480716636,"identity":"3293e281-96f2-43f8-b5f7-3ec33e5cdd9a","order_by":1,"name":"Erika Villashagñay","email":"","orcid":"","institution":"Universitat Politècnica de València","correspondingAuthor":false,"prefix":"","firstName":"Erika","middleName":"","lastName":"Villashagñay","suffix":""},{"id":480716637,"identity":"623da18d-1f4b-4e32-b1f6-1de922f09817","order_by":2,"name":"Lluc Simó","email":"","orcid":"","institution":"Universitat Politècnica de València","correspondingAuthor":false,"prefix":"","firstName":"Lluc","middleName":"","lastName":"Simó","suffix":""},{"id":480716638,"identity":"aba7e392-08ef-4f28-999a-b199b09a05cf","order_by":3,"name":"Juan González","email":"","orcid":"","institution":"Universitat Politècnica de València","correspondingAuthor":false,"prefix":"","firstName":"Juan","middleName":"","lastName":"González","suffix":""}],"badges":[],"createdAt":"2025-07-04 12:09:14","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-7046647/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7046647/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":86120982,"identity":"c29bfa2a-4f83-461a-a8f2-52cac8cc64e1","added_by":"auto","created_at":"2025-07-07 03:43:47","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":165712,"visible":true,"origin":"","legend":"\u003cp\u003eStructure of the proposed system\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/2ceddc42a440da820b07dd72.png"},{"id":86120977,"identity":"d5d0c1d5-fb2f-4a2c-8ef4-42c11bcf5fd5","added_by":"auto","created_at":"2025-07-07 03:43:47","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":258038,"visible":true,"origin":"","legend":"\u003cp\u003eLocation model and reference coordinate system in the MPEG-V standard [26]\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/fc60cdbafae7b291834c59b8.png"},{"id":86120976,"identity":"fd42b116-7552-4852-bcaf-ded8e6c64d81","added_by":"auto","created_at":"2025-07-07 03:43:47","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":217419,"visible":true,"origin":"","legend":"\u003cp\u003eSEVino2 main page. Definition of a sensory effect.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/dcbf81cd2c554011971013f4.png"},{"id":86120973,"identity":"13749ae4-dfbf-4310-905a-1349d8db489f","added_by":"auto","created_at":"2025-07-07 03:43:46","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":545521,"visible":true,"origin":"","legend":"\u003cp\u003eExample of an XML file with SEM metadata generated by using the SEVino2 tool\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/a17fc96b1678d556d2e963b1.png"},{"id":86120990,"identity":"f24467e0-cbaf-4206-b05c-a5d0bfbc4e4a","added_by":"auto","created_at":"2025-07-07 03:43:47","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":238326,"visible":true,"origin":"","legend":"\u003cp\u003eSteve 2.0 Graphical User Interface[Mat20] [34]\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/35431069647fa5d3fcf94f2f.png"},{"id":86120975,"identity":"33cf0821-7f02-41cf-a107-b4d60cd54bb9","added_by":"auto","created_at":"2025-07-07 03:43:46","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":242175,"visible":true,"origin":"","legend":"\u003cp\u003eSESim tool included in SEVino\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/b91145243c76dc161dba609d.png"},{"id":86121329,"identity":"ce1c06c7-f223-4e27-a591-e6f482c27fc3","added_by":"auto","created_at":"2025-07-07 03:51:47","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":1295979,"visible":true,"origin":"","legend":"\u003cp\u003eSensible Media Simulator [35] Graphical User Interface\u003c/p\u003e","description":"","filename":"image7.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/ff4914258d0747700077e9ae.png"},{"id":86120978,"identity":"dd70a15a-9792-49bd-90cf-220ce668a271","added_by":"auto","created_at":"2025-07-07 03:43:47","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":940451,"visible":true,"origin":"","legend":"\u003cp\u003e3D simulator presented in [36]\u003c/p\u003e","description":"","filename":"image8.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/46b5c88f3725ff63cc7a1d21.png"},{"id":86120991,"identity":"a38a8517-3f87-40d6-9f6a-c6b349925b50","added_by":"auto","created_at":"2025-07-07 03:43:47","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":1147790,"visible":true,"origin":"","legend":"\u003cp\u003e3D simulation example of [30]\u003c/p\u003e","description":"","filename":"image9.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/d6587f1441f542f7b4145e2c.png"},{"id":86120993,"identity":"1e4ffd03-0336-4c33-8f0d-8a99337245d0","added_by":"auto","created_at":"2025-07-07 03:43:47","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":206667,"visible":true,"origin":"","legend":"\u003cp\u003eArchitecture of the developed mulsemedia simulation tool\u003c/p\u003e","description":"","filename":"image10.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/dedf2050b82d21d0e4ff427b.png"},{"id":86120980,"identity":"699e0ca4-487b-4b08-81e4-3267e02be882","added_by":"auto","created_at":"2025-07-07 03:43:47","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":644369,"visible":true,"origin":"","legend":"\u003cp\u003e3D simulation environment\u003c/p\u003e","description":"","filename":"image11.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/08467edcdc448f2e218c2de0.png"},{"id":86121000,"identity":"2a482cc7-3449-4fab-830e-e4dfdfdcf143","added_by":"auto","created_at":"2025-07-07 03:43:47","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":858837,"visible":true,"origin":"","legend":"\u003cp\u003eRotation and scaling of the 3D simulation environment\u003c/p\u003e","description":"","filename":"image12.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/16c2e86193980b890e0d737a.png"},{"id":86121335,"identity":"f5d92863-ec10-4597-9fa4-be5aa0fbbc83","added_by":"auto","created_at":"2025-07-07 03:51:47","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":538744,"visible":true,"origin":"","legend":"\u003cp\u003eGUI of the mulsemedia simulation tool\u003c/p\u003e","description":"","filename":"image13.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/ef46a48b86db76063cd6549b.png"},{"id":86121002,"identity":"e1cfb333-19f5-48ef-a818-a15419d4f5ae","added_by":"auto","created_at":"2025-07-07 03:43:47","extension":"png","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":383615,"visible":true,"origin":"","legend":"\u003cp\u003eUser’s preferences scroll window\u003c/p\u003e","description":"","filename":"image14.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/f095d7e8ca4151fa1a5aa8da.png"},{"id":86121336,"identity":"df0a22a0-1dec-43b5-b48c-ad84450e6425","added_by":"auto","created_at":"2025-07-07 03:51:47","extension":"png","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":299445,"visible":true,"origin":"","legend":"\u003cp\u003eInformation about 2 sensory effects activated in a specific position\u003c/p\u003e","description":"","filename":"image15.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/db2fc4f281a58e817f1b547c.png"},{"id":86121800,"identity":"5edf4b03-416a-4cc2-a4d0-18395943353e","added_by":"auto","created_at":"2025-07-07 03:59:47","extension":"png","order_by":16,"title":"Figure 16","display":"","copyAsset":false,"role":"figure","size":273481,"visible":true,"origin":"","legend":"\u003cp\u003eConnection with WebSocket server\u003c/p\u003e","description":"","filename":"image16.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/af4ecb888babb9330a6f44cf.png"},{"id":86121015,"identity":"f08b8418-5fa3-46dc-91d4-2a889d9a2206","added_by":"auto","created_at":"2025-07-07 03:43:48","extension":"png","order_by":17,"title":"Figure 17","display":"","copyAsset":false,"role":"figure","size":331444,"visible":true,"origin":"","legend":"\u003cp\u003eExample of a wind effect activation MPEG-V-compliant message,\u003c/p\u003e","description":"","filename":"image17.png","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/5451239224da3e0639bf20ef.png"},{"id":86122058,"identity":"d89a7f57-6cb5-43e3-a5f0-a9f6e64b1608","added_by":"auto","created_at":"2025-07-07 04:07:51","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":9379437,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7046647/v1/0aee42c9-fbe8-4a3e-a63a-cdef84bd15b4.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eMPEG-V compliant 3D simulation tool for multimedia playback with sensory effects\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eTraditional multimedia systems typically include audio-visual (AV) content that only stimulates the senses of sight and hearing. However, the scientific community and industry have been working for many years on the development of multimedia systems that include sensory effects metadata (SEM) associated with AV content and that are capable of generating such effects, thus stimulating all the users\u0026rsquo; senses. Examples of effects include scent (sense of smell), flavour (sense of taste), vibration, pressure, wind effects (sense of touch), special lighting, temperature, humidity, smoke, vaporisation (environmental effects), etc. The main aim of these systems is to stimulate other senses beyond sight and hearing and, therefore, to provide more realistic and immersive user experiences (\u0026lsquo;\u003cem\u003eSeeing is believing, but feeling is the truth\u003c/em\u003e\u0026rsquo; [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]) during the consumption of multimedia content.\u003c/p\u003e\u003cp\u003eIn order to offer users new sensations by exploring other senses beyond the two traditional ones in interactive multimedia applications, a new concept known as \u003cem\u003eMulSeMedia\u003c/em\u003e (Multiple Sensorial Media) has been proposed. In recent years, this concept has been gaining momentum in many areas (such as entertainment, education or virtual reality -VR-), where the inclusion of multisensory effects (hereafter referred to as mulsemedia content) is intended to provide users with increasingly realistic and immersive experiences [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. In the past, such a practice has been shown to improve the users\u0026rsquo; quality of experience (QoE) [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e][\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e][\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e][\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Currently, we are witnessing a very rapid development of new technologies that expand traditional AV media into multimodal applications. A growing number of companies are manufacturing devices that generate sensory effects, such as \u003cem\u003eSensory Acumen, Inc.\u003c/em\u003e\u003csup\u003e1\u003c/sup\u003e, \u003cem\u003eOlorama Technology\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e (scents) and \u003cem\u003ebHaptics\u003c/em\u003e\u003csup\u003e3\u003c/sup\u003e (vibration). In addition, some theme parks (e.g. \u003cem\u003ePortaventura\u003c/em\u003e\u003csup\u003e4\u003c/sup\u003e in Spain) and next-generation cinemas (\u003cem\u003e4D cinemas\u003c/em\u003e\u003csup\u003e5\u003c/sup\u003e) are already incorporating sensory effects into their experiences.\u003c/p\u003e\u003cp\u003eThe term mulsemedia first appeared in 2014 in [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. In that work, structures or systems that combine the playback of multimedia content with the generation of multiple sensory effects or sensory stimuli were described. That work marked a turning point in the study of immersive multimedia experiences, going beyond the visual and auditory fields and adding the stimulation of other senses, such as smell, touch or thermal perception. Since then, research in the mulsemedia area has experienced significant advances, including the emergence of related standards (e.g. MPEG-V [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]) and Internet of Things (IoT) architectures [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] that allow the integration of real devices that generate sensory effects in multimedia systems. However, there are still many technical challenges to be addressed in the mulsemedia research area, such as, for example, the effective control of effect generation devices or their integration in home environments, among many others.\u003c/p\u003e\u003cp\u003eIn [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], the areas where the inclusion of sensory effects can improve and enrich the users\u0026rsquo; QoE in human-computer interaction are listed, such as telecommunications, education ([\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]), e-commerce, advertising, entertainment (e.g., in 4D cinemas), tourism ([\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]), health (e.g. therapy), social integration or immersive TV [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. In [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] a comprehensive study on mulsemedia systems is presented, highlighting the little exploration of the mulsemedia field, the challenges (especially in the distribution of mulsemedia content and its generation) and its applicability in the mentioned areas. In [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], many of the most important developments in mulsemedia up to 2020 are compiled. One of the effects that has been most frequently experimented with is that of scent. The relevance of scent effects in evoking emotions, feelings, changes of attitude and memories, and the need for systems that can recreate them is explored in [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. In [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] the impact on QoE when adding wind and pressure effects (by means of a haptic waistcoat with sensors) in laboratory prototypes is analysed. In [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e], the masking effect that multisensory content brings to traditional content when the streams involved in the latter present biases is verified, so achieving the synchronisation of the user's perception of the presentation of sensory effects with the precise playback moments of the AV content being consumed at any given moment is an important challenge to guarantee immersivity and good QoE. In [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] and [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e], a series of recommendations are proposed in order to guide the design and evaluation of mulsemedia content consumption systems (in particular, scent effect-based ones), in addition to describing some key challenges to successfully integrate such effects: the integration of mulsemedia content based on scent, synchronisation, standardisation, development of scent generation devices, intensity and duration of the effects, applicability in different areas (health, education, tourism...) and their remote distribution. Some interesting instructions are provided regarding the assessment of mulsemedia quality, including laboratory design, assessor preparation and experimental design. In [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e], some background, history, and essential coding, decoding, and communication technologies that underpin this emerging field, with a focus on eXtended Reality (XR) and holographic communication applications, are presented. Possible future lines of research on mulsemedia communications in the context of 6G wireless systems are also discussed.\u003c/p\u003e\u003cp\u003eOn the other hand, regarding mulsemedia effect generation devices, the work in [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e] presents a compilation of existing devices, as well as a guide for users to build their own mulsemedia environment, both in desktop and immersive scenarios (VR or 360 video).\u003c/p\u003e\u003cp\u003eAlthough they have some limitations, all previous works show how the inclusion of sensory effects brings greater immersion, enjoyment and realism to the consumption of multimedia content. Despite the existence of the MPEG-V standard, the way in which sensory effects are signalled and integrated into multimedia streams throughout the distribution chain remains an unresolved issue. Current approaches only consider local multimedia content and evaluation scenarios with very short clips and pre-set temporal information for sensory effects. They do not sufficiently explore the possible combinations between many sensory effects (e.g. scents, wind effects, vibration/pressure...) and their synchronisation with AV content. However, in mulsemedia systems (especially VR systems), the number of sensory effects to be generated and synchronised can be high, which poses additional new research challenges. The impact of relevant factors such as intensity, persistence, degree of perception and delay, and possible masking effects is also not sufficiently analysed. Furthermore, incorporating multimedia content alongside omnidirectional AV content (e.g. 360 video) introduces the additional challenge of sensory effects being dependent on the viewing perspective (field of view or FoV).\u003c/p\u003e\u003cp\u003eAn example of the infrastructure required for a local mulsemedia environment can be seen in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. It consists of a multimedia content (usually AV) player device communicating, directly or indirectly (through a controller device), with real devices that generate different sensory effects around the user at the appropriate instants depending on the displayed content, i.e. synchronised with the playback of the AV content. However, setting up a complete scenario of this kind, with real physical effect generation devices placed in various positions around the user's position\u003csup\u003e6\u003c/sup\u003e, for the sole purpose of testing the correct functioning of the designed mulsemedia experience, and the corresponding generation of effects at the right instants, is costly and provides little flexibility. A much faster and more cost-effective method is to use simulators that allow checking the accuracy of the defined SEM metadata associated with the AV content. These metadata must follow a specific format and contain precise information about the sensory effects to be activated/deactivated from certain positions around the user and at specific moments during playback. Simulators typically include graphical elements (icons or animations) representing virtual effect generation devices (or actuators). This way, when an actuator for a sensory effect needs to be activated, its graphical representation is highlighted to indicate that the effect should be generated at that time and, therefore, perceived by the users of the mulsemedia system or application.\u003c/p\u003e\u003cp\u003eIn this paper, a web-based 3D simulation tool for the integration of multiple sensory effects in a multimedia consumption experience, compatible with the MPEG-V standard, is presented. It includes a player that accesses AV content stored in a content server (either in common video formats\u003csup\u003e7\u003c/sup\u003e or MPEG-DASH format -for adaptive streaming-, in the current version of the application) and a 3D simulation environment, which allows checking whether the activation/deactivation of sensory effects is triggered at the right instants (i.e. synchronised with the AV content playback) and from the correct positions (around the user), among the 45 that are contemplated in that standard.\u003c/p\u003e\u003cp\u003eFurthermore, in order to be able to use the simulation tool also in a real mulsemedia environment in the future, a communications module has been included in it (using WebSocket technology) to be able to connect and send effect activation/deactivation commands, compatible with the MPEG-V standard, to a device called \u003cem\u003eMulsemedia Controller\u003c/em\u003e or MC (see Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), if it exists\u003csup\u003e8\u003c/sup\u003e, which would manage the real physical devices for generating the different sensory effects available in the local mulsemedia environment. The MC would therefore be an intermediate device or gateway between the player device (running the player and simulation tool) and the real physical sensory effect generation devices. In the envisaged system, communication between the two devices (player/simulator and MC) would be done using WebSocket technology, which allows smooth and uninterrupted communication between them, and the exchanged messages shall follow the MPEG-V standard. For communication between the MC device and the physical effect generation devices or actuators that generate the real sensory effects, other well-known protocols (e.g., the typical ones used in IoT architectures such as HTTP, MQTT, Bluetooth or Zigbee, among others) or proprietary protocols of the device manufacturers may be used (if provided).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThe main contributions of this paper can be summarised as follows:\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eModular web-based 3D mulsemedia simulation tool, including AV content player and MPEG-V compliant mulsemedia metadata rendering.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eDesign and development of a 3D simulation environment.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eEmulation of the activation/deactivation of up to 8 different types of sensory effects in real time in a 3D environment around the user, at the 45 positions defined in the MPEG-V standard.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eCommunications module integrated in the tool to be further used when it is integrated in real mulsemedia scenarios. It is based on the use of WebSocket technology and is used to send the proper MPEG-V-compatible messages to a MC device in charge of controlling the activation/deactivation of sensory effects available in the real scenario, according to the configured user\u0026rsquo;s preferences).\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e\u003cp\u003eThe structure of the article is as follows: Section 2 presents a summary of the studies related to the proposal of the article; Section \u003cspan refid=\"Sec11\" class=\"InternalRef\"\u003e3\u003c/span\u003e provides a detailed description of the developed simulation tool, its architecture, design, involved processes and graphical user interface. The article concludes with some brief conclusions and the references used.\u003c/p\u003e"},{"header":"2. Related works","content":"\u003cp\u003eTo be able to create AV content consumption experiences that include sensory effects, it is necessary to previously generate the SEM metadata with information about those sensory effects and associate them with the AV content. This way, the AV content together with the SEM metadata can then be distributed and used in applications and players that can interpret them. Over the past decade, several tools have been developed that enable the creation of SEM metadata related to sensory effects. These metadata can be incorporated into AV content consumption experiences and usually align with the specifications outlined in Part 3 of the MPEG-V standard. In this section, first, what is included in this part of the standard is briefly presented; then, although there are more, two of the best-known tools created for this purpose, SEVino [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e] and STEVE [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e] are presented; and, finally, other existing mulsemedia simulation tools are summarized, highlighting their main limitations.\u003c/p\u003e\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e2.1. Part 3 of the MPEG-V Standard\u003c/h2\u003e\u003cp\u003eThe international \u003cem\u003eMPEG-V: Media Context and Control\u003c/em\u003e standard [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] provides tools for describing real and virtual worlds. It specifies a format for exchanging data between the real world (the physical scenario or installation where the multimedia application runs and is perceived by the users) and the virtual world (the application itself). It focuses on the communication and representation of virtual world objects to real world objects and vice versa. Among many other things, it offers many tools for describing and representing sensory effects and devices. It consists of 7 parts: in Part 1, the architecture of MPEG-V is described; Part 2 presents the metadata called \u0026lsquo;\u003cem\u003eControl Information\u003c/em\u003e\u0026rsquo;, which can be used to characterise sensory effect generation devices or actuators (e.g. scent generation devices, ambient lights, fans, etc.) and sensors (e.g. temperature, lighting, humidity, etc.). In addition, this part of the standard also defines metadata to describe user\u0026rsquo;s preferences with respect to generation devices and sensors; Part 3, called \u0026lsquo;\u003cem\u003eSensory Information\u003c/em\u003e\u0026rsquo;, presents tools to describe sensory effects; Part 4 specifies characteristics of virtual world objects (e.g. avatars); Part 5 defines metadata called \u0026lsquo;\u003cem\u003eInteraction Information\u003c/em\u003e\u0026rsquo;, which can be used to transmit activation/deactivation commands (addressed to sensory effect generation devices) and current information from sensors; Part 6 specifies tools and types used in different parts of the standard (e.g. classification schemes for scents, light or locations); and finally, Part 7 presents reference software and conformance to the standard.\u003c/p\u003e\u003cp\u003eThis subsection focuses on Part 3 (\u003cem\u003eMPEG-V Part 3, ISO/IEC 23005-3 Sensory information\u003c/em\u003e [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]), which defines the description of sensory effects to be reproduced or generated by sensory effect generation devices. This part standardises the syntax and semantics of the effects by providing the basic sensory information structures to be able to generate \u003cem\u003eSensory Effect Metadata\u003c/em\u003e (SEM). These basic structures consist of building blocks and common attributes (e.g. effect type, duration, fading, intensity, etc.) that are used for all specified effects. The standard also defines a \u003cem\u003eSensory Effect Description Language\u003c/em\u003e (SEDL), based on XML (\u003cem\u003eExtensible Markup Language\u003c/em\u003e), to describe such structures. The attributes allow specifying at which instant a specific effect is to be generated, with what intensity or for how long it is to be activated (i.e., its duration), among other things. The effects must be described using the so-called \u003cem\u003eSensory Effect Vocabulary\u003c/em\u003e (SEV). SEV allows specifying all sensory effects (e.g. light, scent) in detail, their general attributes (activation information, duration, priority and position) and, in addition, depending on the effect, their specific attributes (e.g. colour and specific scents for light and scent effects, respectively). According to the standard, SEDL and SEV must be used together to create SEM descriptions that can be understood and interpreted by a so-called Media Processing Engine (MPE), which can be either a decoder or a computer, capable of analysing these descriptions and controlling real effect generation devices that are compatible with the standard.\u003c/p\u003e\u003cp\u003eTo describe the commands of an effect generation device (also called an \u003cem\u003eactuator\u003c/em\u003e) and the detected information, the MPEG-V standard provides the \u003cem\u003eInteraction Interface Description Language\u003c/em\u003e (IIDL). In addition, it defines the \u003cem\u003eControl Information Description Language\u003c/em\u003e (CIDL), which is used to define commands for the activation/deactivation of effect generation devices or actuators according to user preferences. This language also enables the specification of the capabilities of effect generation devices and sensors.\u003c/p\u003e\u003cp\u003eIn the MPEG-V spatial model, sensory effects have a common attribute called \u003cem\u003elocation\u003c/em\u003e, which specifies the region from which a user of the mulsemedia application should perceive a sensory effect. The MPEG-V spatial model considers the user as the central point of reference, and the location of a sensory effect is defined according to the x, y, and z axes of the 3D space around the user. According to the standard, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, three planes (front, midway and back) are considered, with three height levels in each plane (bottom, middle and top) and five positions in each height level (left, centerleft, center, centerright and right). Therefore, according to the standard, a total of 45 positions can be distinguished. This way, the location of each of the 45 positions from which a sensory effect can be activated/deactivated is represented as a concatenation of three words. For example, the position \u0026ldquo;\u003cem\u003eright:bottom:front\u003c/em\u003e\u0026rdquo; indicates that the effect will be activated/deactivated in the front plane, at the bottom (floor) and to the right of the 3D space around the user. The symbol \u0026ldquo;*\u0026rdquo; can be used to refer to a set of locations. For example, the position \u003cem\u003e\u0026ldquo;*:top:right\u0026rdquo;\u003c/em\u003e applied to an effect indicates that the effect will be activated (or deactivated) on all devices that can generate that effect and that are located at the top right part of the 3D space.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e2.2. Sensory Effect Metadata (SEM) generation\u003c/h2\u003e\u003cp\u003eThere are several graphical tools that enable the generation of SEM metadata, such as SEVino [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e], STEVE [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e], RoSE (\u003cem\u003eRepresentation of Sensory Effects\u003c/em\u003e) Studio[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], SMURF (\u003cem\u003eSensible Media aUthoRing Factory\u003c/em\u003e [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]), H-Studio [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e], or Real 4D Studio [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. However, most of them are obsolete and no longer available, or the authors have been unable to access and test them. This subsection describes two of those tools that are accessible and that have been tested by the authors for creating SEM metadata: Sevino2 and STEVE 2.0. Apart from STEVE 2.0, all of the above are based on the timeline-based temporal synchronisation paradigm and allow users to graphically define SEM metadata (in the case of H-Studio, only for sensory effects of movement and vibration) for AV content and save it to files. A more detailed review and comparison of the existing mulsemedia authoring tools is presented in [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e], as well as several proposals for the representation of sensory effects and their characteristics.\u003c/p\u003e\u003cdiv id=\"Sec5\" class=\"Section3\"\u003e\u003ch2\u003e2.2.1 The SEVino2 tool\u003c/h2\u003e\u003cp\u003eSEVino2 (\u003cem\u003eSensory Effect Video Annotation tool\u003c/em\u003e)[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e], is written in Java and uses VLCJ\u003csup\u003e9\u003c/sup\u003e. Its graphical user interface (GUI) is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. It allows the generation of sensory effect metadata from AV content in XML files, complying with the specifications of the MPEG-V standard. Sevino2 allows for the inclusion of 7 types of sensory effects (wind, vibration, lights, temperature, water spray or diffuser, aromas and mist) to be generated during the playback of associated AV content. After selecting an AV content, the activation/deactivation instants for those sensory effects during the playback of the AV content can be quickly and intuitively defined. It follows a timeline-based temporal synchronisation paradigm. For each effect, several parameters can be defined, such as duration, fading, priority, location, intensity value and range, start and end of the activation, among others, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. The tool includes a multimedia content player (\u003cem\u003eSensory Effect Media Player\u003c/em\u003e or SEMP), and a simulator (\u003cem\u003eSensory Effect Simulator\u003c/em\u003e or SESim, explained later) to check the correctness of the generated SEM metadata and that the defined effects are activated/deactivated at the desired instants and with the desired characteristics.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eBased on the defined effects by the user, SEVino2 generates an XML file (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e) with the MPEG-V-compliant SEM metadata description. This file can be modified later, either in the tool itself or manually by the user. In addition to allowing the export of sensory effect annotations to XML files, SEVino2 also allows the import of existing MPEG-V SEM description files so that users can easily check, modify or expand them using its GUI.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eSEVino has several important limitations or issues to take into account. On the one hand, its graphical interface does not allow the activation of the same type of sensory effect at the same time from different locations. In other words, it does not allow overlapping of the same type of sensory effect generated from different locations. On the other hand, all effects activated at the same time are grouped by SEVino2 in the SEM metadata (in the XML file) into effect groups (using the \u003cem\u003eGroupOfEffects\u003c/em\u003e tag, Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e) with a common activation timestamp in the tag itself. For instance, in an explosion scene, several effects can be activated at the same time (e.g., wind, light and vibration). The \u003cem\u003eGroupOfEffects\u003c/em\u003e tag allows these effects to be efficiently combined as a single effect. This may require special processing to be done by applications that interpret these metadata to process the data for each effect individually rather than as a group.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec6\" class=\"Section3\"\u003e\u003ch2\u003e2.2.2 The STEVE tool\u003c/h2\u003e\u003cp\u003eThe first version of STEVE (\u003cem\u003eSpatio-Temporal View Editor\u003c/em\u003e) was presented in [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. It is an authoring tool that allows users with little to no knowledge of multimedia creation languages and models to create interactive multimedia presentations or applications for web and digital TV systems in a simple and user-friendly way. Its synchronisation model is based on a proprietary model called SIMM (\u003cem\u003eSimple Interactive Multimedia Model\u003c/em\u003e) and, unlike SEVino, it follows an event-based temporal synchronisation paradigm. It allows authors to edit spatio-temporal views of hypermedia documents and create causal temporal relationships between their multimedia elements. The editor also supports the definition of user interactions and the simulation of these asynchronous events to preview the hypermedia presentation. Users can also define the properties of the multimedia presentation and verify them within the spatial visualisation interface. To do this, it includes its own player.\u003c/p\u003e\u003cp\u003eApplications created in STEVE can be exported to HTML5 and NCL (\u003cem\u003eNested Context Language\u003c/em\u003e) documents. NCL is a standard XML-based declarative language for creating hypermedia documents for digital TV systems and is also an ITU standard for multimedia services and applications for IPTV systems. Version 4 of NCL (NCL 4.0) already integrates sensory effects as first-class entities. This version allows authors to define properties for effect elements and use descriptors to refer to them, while specifying the position of effects in spherical coordinates, making them independent of the physical installation of the application.\u003c/p\u003e\u003cp\u003eIn [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e] and [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e], a new event-based approach is proposed for creating mulsemedia metadata, using a new model called MultiSEM (\u003cem\u003eMultimedia Sensory Effect Model\u003c/em\u003e) that facilitates graphical development for multimedia applications, allowing multiple sensory effects to be created, integrated and synchronised with multimedia content. Both papers present an extension of the tool (called STEVE 2.0) as a proof of concept for this model, which allows any user to easily and intuitively create mulsemedia presentations and/or applications including metadata about multimedia effects synchronised with the AV content. The tool provides, in a graphical form, causal temporal relationships based on MultiSEM relationships and provides authors with feedback on inconsistencies in temporal synchronisation. In addition, users can create interactivity relationships to activate, for example, sensory effects through user interactions with the multimedia application. The GUI of STEVE 2.0 is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e. It shows the content repository in the top left corner, the properties panel in the centre, the preview screen in the top right corner, and the time view in the bottom region. In addition, version 2.0 provides a list of the nine sensory effects supported in the view with the timeline, which are wind, water diffusion, vibration, cold, heat, aromas, lights, flashes, and fog. Furthermore, it allows effects to be grouped together, such as the rainstorm effect, which can consist of flashes, wind and water diffusion effects. Users can select one of these sensory effects, drag it to the timeline for temporal synchronisation with the other multimedia elements, and then define its representation characteristics (e.g., intensity, aroma type, light frequency) and physical positions. To graphically define temporal synchronisation, users can use the 12 temporal causal relationships supported by the tool (defined in [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]) that appear at the bottom of the interface, below the timeline. The tool contains a machine learning-based solution, called STEVEML [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e], to automatically detect and extract possible sensory effects (from among those it supports) associated with video content, without user intervention. The user can then use the GUI to modify or adjust the synchronisation of these effects and their properties.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003e2.3. Mulsemedia effects simulators\u003c/h2\u003e\u003cp\u003eSimulators can be used to help mulsemedia system/application designers check that their designs work properly. Various mulsemedia simulators that have been developed in the past are briefly discussed in this Section.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003e2.3.1. Sensory Effect Simulator (SESim)\u003c/h3\u003e\n\u003cp\u003eThe SEVino2 tool includes a simulation tool called \u003cem\u003eSensory Effect Simulator\u003c/em\u003e (SESim) to check the correct activation/deactivation of sensory effects in synchronization with the associated AV content. Like SEVino2, SESim is also developed in Java (using the VLCJ framework) and has a modular architecture. To perform the simulation, SESim needs to receive the AV content and an additional XML file with the associated SEM metadata (which may have been created with SEVino) as inputs. The SESim \u003cem\u003eXML parser module\u003c/em\u003e then extracts the sensory effect data from the SEM file and sends it to the \u003cem\u003esimulator module\u003c/em\u003e. This module sends the AV content to the \u003cem\u003eplayer module\u003c/em\u003e and the extracted effects to the \u003cem\u003etimer module\u003c/em\u003e. The \u003cem\u003etimer module\u003c/em\u003e also receives the current playback time to activate/deactivate the corresponding virtual actuator. Figure \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e shows the GUI of this tool. The left image shows the interface when the application is launched, displaying the seven types of effects that can be simulated: wind, lights, fog, temperature, vibration, water diffusion, and scents. Around the video player there are a series of boxes representing the effect generation devices and some additional features, such as their position. The right image shows the simulator in operation at a certain instant during video playback. The activated sensory effects and some additional data, such as the colour of the lights or the intensity of the sound or fans, are highlighted in red. Additionally, there is a text panel that displays messages (\u003cem\u003elogs\u003c/em\u003e) with information on the effects that are being activated/deactivated during the playback.\u003c/p\u003e\n\u003cp\u003eIn addition to analysing and interpreting the SEM metadata in the XML file, SESim supports automatic average colour extraction. If the SEM metadata, generated with Sevino2, indicates that automatic colour extraction is permitted, SESim extracts a video frame every 0.1 seconds and divides it into several parts. The average colour is calculated for each part and rendered in the corresponding light around the video player.\u003c/p\u003e\n\u003cp\u003eSESim has the following limitations:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eThe simulation cannot be stopped at a given instant or jumped to a specific point in the AV content to check which effects are activated at that time.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eAll effects representations are positioned around the video player (Fig. \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e), so it is not shown from which of the 45 positions considered by the MPEG-V standard each effect is being generated.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eThe number of effect types is limited to 7.\u003c/p\u003e\n \u003c/li\u003e\n\u003c/ul\u003e\n\u003cdiv id=\"Sec9\" class=\"Section3\"\u003e\n \u003cdiv class=\"Heading\"\u003e2.3.2 Simulator integrated into STEVE\u003c/div\u003e\n \u003cp\u003eThe STEVE tool allows authors to check the temporal and spatial behaviour of multimedia applications by providing a synchronised graphical view of time and space. The icons of the effects that are activated at any given time appear on the AV player (Fig. \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e shows the heat effect icon in white colour overlapping the video shown in the upper right corner while a cup of coffee or tea is being served in the video sequence). This simulator has the following limitations:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eIt neither indicates which of the 45 positions specified in the standard each effect is generated from.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eNo extra information about the effects is provided (e.g., intensity, fade in/out, etc.).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eThe number of effect types displayed in the simulation is also limited (up to 9).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec10\" class=\"Section3\"\u003e\n \u003cdiv class=\"Heading\"\u003e2.3.3 Sensible Media Simulator\u003c/div\u003e\n \u003cp\u003eAnother simulator also compatible with the MPEG-V standard is the \u003cem\u003eSensible Media Simulator\u003c/em\u003e [\u003cspan class=\"CitationRef\"\u003e35\u003c/span\u003e], whose GUI is shown in Fig. \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e. It was developed using Flex and is designed to simulate sensory effects in a car and test a system designed for that environment, taking advantage of existing devices inside the vehicle to generate sensory effects. It assumes that a car is equipped with a wind system with cooling, heating and ventilation functions, vibrating seats with massage function and heating cables, and a LED lighting system with colour and intensity control. The system takes the temperature value provided by the temperature sensor inside the vehicle to be taken as input data at any given time. This enables adaptive control of the temperature of the wind effects to be generated according to that temperature. The car\u0026apos;s entertainment system provides a GUI displaying an AV content player and the available devices that can be used as sensory effect generation devices or actuators. Users only need to load AV content and SEM metadata. It is based on the use of the MPEG-V standard CIDL language to describe the capabilities of the effect generation devices or actuators, as well as the user\u0026apos;s sensory preferences. The IIDL language is also used to describe the information detected by the sensors, as well as the device commands.\u003c/p\u003e\n \u003cp\u003eThis simulator has the main drawback of being designed and being useful for a very specific use case (multimedia system inside a car).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec11\" class=\"Section3\"\u003e\n \u003cdiv class=\"Heading\"\u003e2.3.4 3D sensory effects simulator based on Maxon Cinema 4D\u003c/div\u003e\n \u003cp\u003eIn [\u003cspan class=\"CitationRef\"\u003e36\u003c/span\u003e], a 3D simulator of sensory effects is presented, created with Maxon Cinema 4D\u003csup\u003e10\u003c/sup\u003e, whose GUI is shown in Fig. \u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e. It is based on an event-based temporal synchronisation paradigm, receives files with SEM metadata and simulates the activation/deactivation of the sensory effects during the multimedia presentation. Using the simulator, a user can add, remove and reposition effect generation devices or actuators in a 3D room. The actuators are distributed on the walls and ceiling of the room and are represented by black circles with black cone-shaped lines (the cone indicates the direction of the actuator). In the simulator, the direction is fixed and always towards the centre of the 3D room. The colour of an actuator lights up to indicate that it is active at a given moment. Each colour represents a different type of effect actuator: white for light effects, red for heat effects and green for cold wind effects. The intensity of the colour representation is associated with the intensity of the effect being played. In addition to being compatible with the MPEG-V standard positioning system, the simulator implements an extension of the standard to allow authors to specify the location of sensory effects more precisely. To do this, it makes use of a spherical coordinate system for positioning.\u003c/p\u003e\n \u003cp\u003eThis simulator has the following limitations:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eIt only supports 3 types of sensory effects (light, heat and cold wind).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eIt displays very limited information about the effects: the intensity of a sensory effect through 3 shades of the colour assigned to its 3D actuator.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eIt requires the use of a professional software package (Maxon Cinema 4D).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec12\" class=\"Section3\"\u003e\n \u003cdiv class=\"Heading\"\u003e2.3.5 Real4DAStudio 3D sensory effects simulator [\u003cspan class=\"CitationRef\"\u003e30\u003c/span\u003e]\u003c/div\u003e\n \u003cp\u003eIn [\u003cspan class=\"CitationRef\"\u003e30\u003c/span\u003e], a proprietary mulsemedia content authoring tool called Real4DAStudio is presented, which allows the creation of MPEG-V-based SEM metadata and also enables 3D mulsemedia simulation with up to nine different types of effects (light, flashes, temperature, wind, vibration, water diffusion, aromas, fog and rigid body motion). Content-based interfaces and an event-based paradigm are used to synchronise the effects. To the authors\u0026apos; knowledge, the tool is commercially available\u003csup\u003e11\u003c/sup\u003e and is not openly distributed, which is a significant limitation.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"3. Developed mulsemedia simulation TOOL","content":"\u003cp\u003eThis section presents the web-based tool developed for AV content playback and simulation of the activation/deactivation of multisensory effects, as specified in the metadata associated with that AV content. This tool will usually run on a user\u0026apos;s main AV content consumption device. It has been developed using HTML5 technology (HTML, CSS, and JavaScript), as well as libraries for 3D web development. CSS has been used to structure and style the interface, ensuring an intuitive user experience. JavaScript handles the application logic, facilitating user interaction and controlling the playback of the AV content. It also captures user interaction events and, if a local environment with real effect generation devices exists, communicates with the Mulsemedia Controller device via a WebSocket-based communication channel.\u003c/p\u003e\n\u003cp\u003eTo work properly, the tool needs that the MPEG-V compatible SEM metadata, corresponding to the sensory effects related to the AV content to be played in it, have been previously generated. So, in this section, first, an explanation of the methodology for generating the SEM metadata is provided, followed by a more detailed description of the developed tool.\u003c/p\u003e\n\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n \u003ch2\u003e3.1. Mulsemedia content (SEM metadata) generation\u003c/h2\u003e\n \u003cp\u003eAs explained in section 2, there are multiple solutions that can be used to create XML-based SEM metadata. In this case, the utilisation of the SEVino2 tool is proposed, which employs the SEDL language and facilitates the creation of an XML file containing the specified metadata in accordance with the specifications outlined in part 3 of the MPEG-V standard [\u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e]. A notable benefit of using this software is that the files generated with SEM metadata created using it can be easily modified by the users with a simple text editor, allowing them to add their own effects or modify the existing ones.\u003c/p\u003e\n \u003cp\u003eHowever, as previously mentioned, SEVino2 has several issues that affect the development of mulsemedia tools: it does not allow the activation of the same type of sensory effect at the same time from different locations; and it groups all the effects that are activated at the same time in the SEM metadata (in the XML file) within the \u003cem\u003eGroupOfEffects\u003c/em\u003e tag with a single common activation timestamp in the tag itself. The first one can be overcome by editing the generated XML file and adding effects manually, copying the metadata for the effect created with SEVino2 and then pasting it, changing the location to create another identical effect but generated from another position. The second one implies the need to process the metadata to obtain the activation/deactivation times for each individual effect.\u003c/p\u003e\n \u003cp\u003eThe presented tool in this paper could be adapted to support any type of effect included in the SEM metadata XML file, even if it is not defined in the MPEG-V standard or is not provided by Sevino2.\u003c/p\u003e\n \u003cp\u003eThe AV content files (in common video or MPEG-DASH formats) that are to be displayed in the tool, together with their associated XML SEM metadata files, must be stored on a multimedia server accessible via web.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\n \u003ch2\u003e3.2. Modular architecture of the mulsemedia simulation tool\u003c/h2\u003e\n \u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e10\u003c/span\u003e shows the modular architecture of the developed tool. The user selects AV content, which will have its associated XML file with SEM metadata and can play it either in full screen or in the 3D simulator. The different modules of the architecture are explained below.\u003c/p\u003e\n \u003cdiv id=\"Sec15\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.1 AV Media Player\u003c/h2\u003e\n \u003cp\u003eA video player using the HTML video element and the JavaScript \u003cem\u003eDashJS\u003c/em\u003e library has been used in the tool. Depending on whether the AV content (chosen by the user from the drop-down list available at the top left of the GUI\u003csup\u003e12\u003c/sup\u003e) is in MPEG-DASH format, the \u003cem\u003eDashJS MediaPlayerClass\u003c/em\u003e instance will be attached or detached from the HTML video element. In addition, this section includes controls to start and stop playback, move forward or backward in the video, display the playback time and total duration of the video, control the audio volume, and view the content in full screen, as shown below.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec16\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.2 XML parser\u003c/h2\u003e\n \u003cp\u003eThis module receives a file containing the SEM metadata and extracts the information of all the characteristics of the effects that can be activated when the user plays the selected AV content. All the important information about each effect is stored in internal data structures, such as its type, activation/deactivation times, activation intensity, and whether it has gradual activation or deactivation (\u003cem\u003efade-in\u003c/em\u003e or \u003cem\u003efade-out\u003c/em\u003e).\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec17\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.3 Sensory effects manager\u003c/h2\u003e\n \u003cp\u003eThe internal data structures generated by the XML parser module are passed to the \u003cem\u003eSensory effects manager\u003c/em\u003e, which also receives information about the playback point of the AV content player. This module is responsible for checking whether, at the current playback point, it is necessary to graphically represent the activation of any effects in the 3D scene at a specific location. So, it will pass the precise indications to the 3D simulation module.\u003c/p\u003e\n \u003cp\u003eFurthermore, if there exists an MC device, it also transfers this information to the communications module so that, if applicable, the corresponding message is sent to that device via a previously established WebSocket channel. To do this, it creates MPEG-V-compatible messages to activate/deactivate the sensory effects with the appropriate parameters, always considering the user\u0026apos;s preferences, and passes them to the communications module.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec18\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.4 Communications module\u003c/h2\u003e\n \u003cp\u003eAs mentioned above, the tool includes a communications module to establish a connection (using WebSocket technology, through an intermediate server) with the MC device (see Fig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e), if it exists, and to send it the corresponding MPEG-V standard-compliant messages to activate/deactivate the sensory effects properly. The communications module has a mechanism for controlling connections to the server that manages automatic reconnections in the event of sporadic network interruptions (usually a WiFi network) or the WebSocket channel.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec19\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.5 User preferences module\u003c/h2\u003e\n \u003cp\u003eThe tool can store users\u0026rsquo; preferences so that the settings of the effect generation devices can be adjusted to improve the quality of their multimedia consumption experience (i.e., QoE) through personalised services and increasing the level of enjoyment and satisfaction. The tool allows users to choose which effects they want to activate/deactivate from those available, and to configure them during the simulation, discarding those they do not want. They can also configure the characteristics of the effects (e.g. maximum or minimum intensities, etc.). If the MC device is present, only the commands corresponding to the activation/deactivation of the accepted effects, considering the user\u0026apos;s preferences, are passed to the communications module to be sent to that device via WebSocket.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec20\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.6 3D simulation module\u003c/h2\u003e\n \u003cp\u003eThe 3D simulation module, which is explained in more detail in the following section, is responsible for representing in a 3D space that simulates a rectangular room the different effects that are activated/deactivated during the playback of AV content. The 3D space contains a virtual screen inside where the AV content is rendered and 3D elements that represent the activation/deactivation of effects in each of the 45 positions considered in the MPEG-V standard (see section 2.1), around the user, who is assumed to be in the centre of the room.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\n \u003ch2\u003e3.3. Web-based 3D simulation environment\u003c/h2\u003e\n \u003cp\u003eTo simulate the activation/deactivation of (one or more) sensory effects in each of the 45 positions around the user covered by the MPEG-V standard, a 3D web-based environment has been created using Blender\u003csup\u003e13\u003c/sup\u003e and three.js\u003csup\u003e14\u003c/sup\u003e. On the one hand, as mentioned above, this 3D environment simulates a 3D room with a virtual 2D screen or display on the back wall, on which the video content will be displayed (Fig. \u003cspan class=\"InternalRef\"\u003e11\u003c/span\u003e). The user is assumed to be in the centre of the room, facing the virtual 2D display. On the other hand, to visually simulate the activation/deactivation of the effects in each of the 45 defined positions, 45 transparent spheres have been placed in those positions in the 3D environment (Fig. \u003cspan class=\"InternalRef\"\u003e11\u003c/span\u003e). Each sphere will change its colour as sensory effects are activated in or from its position. If several effects are activated in the same position at the same time, the first effect to be activated will define the colour of the sphere in that position. For the rest of the concurrent effects, rings will be drawn around that sphere in the colour corresponding to each of them, as shown in Fig. \u003cspan class=\"InternalRef\"\u003e11\u003c/span\u003e.\u003c/p\u003e\n \u003cp\u003eAs can be seen, the virtual 2D display with the video viewer is situated on the front wall of the room, and the spheres have the potential to obstruct the viewer\u0026apos;s perspective of the video. This is not problematic, as the user is able to interact with the 3D scene using the mouse (e.g. zoom in, move or rotate) to obtain a better view of the simulation of the activation/deactivation of the different effects (Fig. \u003cspan class=\"InternalRef\"\u003e12\u003c/span\u003e).\u003c/p\u003e\n \u003cp\u003eFor user\u0026rsquo;s interaction with the 3D environment, a three.js tool called \u003cem\u003eRaycaster\u003c/em\u003e\u003csup\u003e15\u003c/sup\u003e has been used, which detects which object the mouse is on at any given moment within the 3D space.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec22\" class=\"Section2\"\u003e\n \u003ch2\u003e3.4. Developed mulsemedia simulation tool\u003c/h2\u003e\n\u003c/div\u003e\n\u003ch3\u003e3.4.1 Graphical user interface\u003c/h3\u003e\n\u003cp\u003eAs illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e13\u003c/span\u003e, the GUI of the developed web-based tool is composed of several key components. At the top of the screen, a drop-down menu is available to select an AV content from those available on the web server, along with the playback controls (play/pause button, volume control, and progress bar).\u003c/p\u003e\u003cp\u003eThe current version of the tool simulates the activation/deactivation of eight sensory effects: fog, light, aromas, mist, temperature, wind, vibration and flashes (to be extended in further work). At the top, there is an information panel showing the types of effects and the colour assigned to each one (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eColours assigned to the 8 sensory effects supported by the simulation tool\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"2\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eType of effect\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eColour\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTemperatureType (Heating/Cooling)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eRed\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eScentType\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eGreen\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLightType\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eBlue\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eWindType\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003ePurple\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSprayingType\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eAqua\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eFogType\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eOlive\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eVibrationType\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eBlack\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eFlashType\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003ePink\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThe user can also select which effects they want to include in the simulation (and, if applicable, in the real mulsemedia system) and their properties. After clicking on the button labelled \u0026lsquo;\u003cem\u003eEffects\u0026rsquo;\u003c/em\u003e, the user\u0026rsquo;s preferences window will appear (Fig.\u0026nbsp;\u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e14\u003c/span\u003e). By default, the activation/deactivation of all the effects included in the downloaded SEM metadata (XML file) are included in the simulation, but the users can configure only those that interest them from the user preferences panel and exclude the others (maybe because they bother or annoy the user). This way, the tool will adjust the settings of the simulation (and, in case a real scenario is available, of the generation devices) based on these preferences. For example, in a real mulsemedia scenario, a pregnant woman may not tolerate certain smells or even does not want to experience sudden vibrations or abrupt movements, so she can exclude unpleasant aromas and vibration effects from the mulsemedia experience.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThe central part of the tool shows the simulated 3D room with the transparent spheres placed in the 45 positions specified in the MPEG-V standard. Initially, the 3D scene is shown from the back of the room so that all the positions (spheres) represented can be seen. However, as already mentioned, the user can interact with it (rotating, moving or zooming it in or out) as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e12\u003c/span\u003e.\u003c/p\u003e\u003cdiv id=\"Sec24\" class=\"Section3\"\u003e\u003cdiv class=\"Heading\"\u003e3.4.2 Tool processes\u003c/div\u003e\u003cp\u003eThe tool involves the following processes\u003csup\u003e16\u003c/sup\u003e:\u003c/p\u003e\u003cp\u003e\u003cem\u003ea) AV content selection\u003c/em\u003e\u003c/p\u003e\u003cp\u003eInitially, from the AV content selector (drop-down list), the user must select an item. That list contains all the AV content items available on the server (in the current version, it includes video files with most common formats -e.g., MP4- and MPEG-DASH index files with the \u0026lsquo;.\u003cem\u003empd\u0026rsquo;\u003c/em\u003e extension). Upon selection, the XML file with the SEM metadata associated to the selected AV content is also automatically downloaded. The XML parser module processes it, extracting the properties of each of the included effects. They include the type of effect, the position in the 3D space, the start and end times of the effect generation, among some others.\u003c/p\u003e\u003cp\u003e\u003cem\u003eb) Playback of video content\u003c/em\u003e\u003c/p\u003e\u003cp\u003eThe user can start (and stop) the playback of the AV video content using the button labelled \u0026lsquo;\u003cem\u003ePlay/Pause\u0026rsquo;\u003c/em\u003e, as well as skip forward or backward during playback (by clicking on the progress bar), without affecting the activation/deactivation of sensory effects at the precise instants as defined in the SEM metadata.\u003c/p\u003e\u003cp\u003eBy default, the web-based tool displays the 3D simulation environment described above, in which the user can view the video playback on the virtual 2D screen included in it and see the 3D simulation of how the effects are dynamically activated/deactivated at each position around the user as playback progresses.\u003c/p\u003e\u003cp\u003eTo find out which effects are active at a precise instant and in a specific position (i.e., in a coloured sphere), the user can click on the corresponding sphere (i.e., the one in that position) and a pop-up window will appear on the right-hand side of the page showing the information of all the effects active in that position (Fig.\u0026nbsp;\u003cspan refid=\"Fig14\" class=\"InternalRef\"\u003e15\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cem\u003ec) Full-screen playback\u003c/em\u003e\u003c/p\u003e\u003cp\u003eThe tool also allows the users to view video content in full-screen mode (by selecting the \u0026lsquo;\u003cem\u003eFull screen\u003c/em\u003e\u0026rsquo; checkbox at the top left, near the AV content selector, see Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e13\u003c/span\u003e), without displaying the 3D effects simulation, for a better viewing experience. This option has been provided so that, in case the tool is used in a real mulsemedia scenario (with an MC device and real sensory effect generation devices), the users can view the AV content using the entire screen and enjoy a more realistic and complete mulsemedia experience.\u003c/p\u003e\u003cp\u003e\u003cem\u003ed) Activation/deactivation of real effects\u003c/em\u003e\u003c/p\u003e\u003cp\u003eIn case the user has a real mulsemedia environment and the Mulsemedia Controller device, the communications module will be used to send (via WebSocket protocol) the relevant commands compatible with the MPEG-V standard for activating/deactivating effects. To facilitate this process, the tool includes several elements at the top of the page: a checkbox labelled \u0026lsquo;\u003cem\u003eMulsemedia Controller\u0026rsquo;\u003c/em\u003e; two boxes for introducing the IP address or name of the WebSocket server and the port on which that service is active; and a button labelled \u0026lsquo;Connect\u0026rsquo; to initiate the connection. Unless the checkbox is activated, the other elements remain blocked. The colour of the button labelled \u0026lsquo;Connect\u0026rsquo; will change to green or dark grey to indicate connection or disconnection to the server, respectively, and adjacent text messages will appear indicating the status (Fig.\u0026nbsp;\u003cspan refid=\"Fig15\" class=\"InternalRef\"\u003e16\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThe tool, therefore, in this case, apart from simulating the sensory effects activation/deactivation in the 3D environment, at the moment of showing the activation/deactivation (or even before, depending on the properties of the sensory effect generation devices\u003csup\u003e17\u003c/sup\u003e), will also send the corresponding MPEG-V compliant messages to the WebSocket server so that it, in turn, forwards them to the Mulsemedia Controller device. Figure\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e17\u003c/span\u003e shows an example of a message sent to activate a wind effect (\u003cem\u003eWindtype\u003c/em\u003e) on the generation device (e.g., a fan) with the identifier \u0026lsquo;\u003cem\u003ewind001\u0026rsquo;\u003c/em\u003e and an intensity of \u0026lsquo;\u003cem\u003e30\u003c/em\u003e\u0026rsquo; (%) of its maximum intensity, at the absolute time (\u003cem\u003eabsTime\u003c/em\u003e) \u0026lsquo;\u003cem\u003e1:30:23\u003c/em\u003e\u0026rsquo;.\u003c/p\u003e\u003cp\u003eThe communication protocol between the simulation tool and the MC device, via WebSocket technology, is beyond the scope of this paper, and its complete implementation is left for further work.\u003c/p\u003e\u003c/div\u003e"},{"header":"4. Conclusions","content":"\u003cp\u003eIn this article, a web-based 3D mulsemedia simulation tool, compatible with the specifications in the MPEG-V standard, has been presented. It enables creators of mulsemedia systems and applications based on that standard to save time and money when checking their correct performance, without the need for real physical sensory effects generation equipment. The tool makes use of AV content and its corresponding SEM metadata (which can be generated by using several MPEG-V-compliant mulsemedia authoring tools). It allows checking visually and in a virtual 3D environment the correct performance of the activation/deactivation of the effects included in the SEM metadata (i.e., whether they are synchronised with the AV content playback) and from any of the 45 positions around the user specified by the standard. The tool also allows the user to customise which effects are to be incorporated within the simulation (by default, all those included in the metadata are simulated) and their properties.\u003c/p\u003e\u003cp\u003eIn view of further integration of the tool in real physical mulsemedia scenarios, a communications module based on WebSocket technology has been included in it to facilitate the communication with a WebSocket server. Through this server, the MPEG-V compliant activation/deactivation messages will be forwarded to a multimedia controller device. On the one hand, the development of that device and the communications protocol with it via WebSocket is left for further work. On the other hand, the positioning model employed in MPEG-V imposes constraints on the number of potential locations for sensory effect generation around the user. In future versions of the tool, the simulated 3D environment will be modified to incorporate an additional positioning model that also takes spherical coordinates into consideration, and more types of effects will be included. The former will enable the generation of effects to be verified from more specific locations, in addition to the pre-established locations considered in the MPEG-V standard.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eFunding:\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe article includes funding (included in the ACK section of the paper). \u003cem\u003eThe publication was supported, in part, by the following grants with refs.: PID2021-126645OB-I00, funded by MICIU/AEI/10.13039/501100011033/ and by \u0026ldquo;ERDF A way of making Europe\u0026rdquo;; CIAICO/2022/025, funded by the Conselleria de Innovaci\u0026oacute;n, Universidades, Ciencia y Sociedad Digital of Generalitat Valenciana (DOGV 8919/05.10.2020); and ACIF/2021/192 for pre-doctoral researchers, funded by Generalitat Valenciana under the \u0026lsquo;Programa I+D+i \u0026rsquo; and the European Social Fund (ESF).\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contribution Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll authors (F.B., E.V., LL.S. and J.G.) contributed to the development and testing of the simulation tool. E.V. and J.G. designed the 3D simulation scenario. E.V., F.B. and LL.S. wrote the main part of the code of the tool. J.G. checked it and corrected some detected mistakes. Additionally, F.B. was the supervisor of the work and the writer of the first draft of the manuscript. All the authors (F.B., E.V., LL.S. and J.G.) commented on previous versions of the manuscript, and, finally, read and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis publication was supported, in part, by the following grants with refs.: PID2021-126645OB-I00, funded by Ministerio de Ciencia, Innovaci\u0026oacute;n y Universidades (MICIU), Agencia Estatal de Investigaci\u0026oacute;n (AVI) ERDFMICIU/AEI/10.13039/501100011033/ and by \u0026ldquo;ERDF A way of making Europe\u0026rdquo;; CIAICO/2022/025, funded by the Conselleria de Innovaci\u0026oacute;n, Universidades, Ciencia y Sociedad Digital of Generalitat Valenciana (DOGV 8919/05.10.2020); and ACIF/2021/192 for pre-doctoral researchers, funded by Generalitat Valenciana under the \u0026lsquo;Programa I+D+i \u0026rsquo; and the European Social Fund (ESF).\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eGhinea G, Timmerer C, Lin W, Gulliver SR (2014) \u0026lsquo;Mulsemedia: State of the Art, Perspectives, and Challenges.\u0026rsquo;, \u003cem\u003eACM Transactions on Multimedia Computing, Communications, and Applications\u003c/em\u003e, vol. 11, no. 1s, pp. 1\u0026ndash;23, Oct. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1145/2617994\u003c/span\u003e\u003cspan address=\"10.1145/2617994\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVelasco C, Obrist M (2020) \u0026lsquo;Multisensory Experiences: Where the senses meet techonology\u0026rsquo;, \u003cem\u003eOxford\u003c/em\u003e, p. 112. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/oso/9780198849629.001.0001\u003c/span\u003e\u003cspan address=\"10.1093/oso/9780198849629.001.0001\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWaltl M, Timmerer C, Hellwagner H (2010) \u0026lsquo;Improving the quality of multimedia experience through sensory effects\u0026rsquo;, in \u003cem\u003e2010 2nd International Workshop on Quality of Multimedia Experience, QoMEX 2010 - Proceedings\u003c/em\u003e, IEEE, Jun. pp. 124\u0026ndash;129. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/QOMEX.2010.5517704\u003c/span\u003e\u003cspan address=\"10.1109/QOMEX.2010.5517704\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWaltl M, Rainer B, Timmerer C, Hellwagner H (2012) \u0026lsquo;A toolset for the authoring, simulation, and rendering of sensory experiences\u0026rsquo;, in \u003cem\u003eMM 2012 - Proceedings of the 20th ACM International Conference on Multimedia\u003c/em\u003e, pp. 1469\u0026ndash;1472. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1145/2393347.2396522\u003c/span\u003e\u003cspan address=\"10.1145/2393347.2396522\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYuan Z, Ghinea G, Muntean GM (Jan. 2015) Beyond multimedia adaptation: Quality of experience-aware multi-sensorial media delivery. IEEE Trans Multimedia 17(1):104\u0026ndash;117. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/TMM.2014.2371240\u003c/span\u003e\u003cspan address=\"10.1109/TMM.2014.2371240\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRainer B, Waltl M, Cheng E, Shujau M, Timmerer C, Davis S (2012) \u0026lsquo;Investigating the impact of sensory effects on the Quality of Experience and emotional response in web videos\u0026rsquo;, in \u003cem\u003e2012 Fourth International Workshop on Quality of Multimedia Experience\u003c/em\u003e, Melbourne, VIC (Australia), pp. 278\u0026ndash;283\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e\u0026lsquo;ISO/IEC 23005 (2011) Information technology - Media context and control (MPEG-V)\u0026rsquo;, 2011\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJalal L, Anedda M, Popescu V, Murroni M \u0026lsquo;Internet of Things for Enabling Multi Sensorial TV in Smart Home\u0026rsquo;, in (2018) \u003cem\u003eIEEE Broadcast Symposium, BTS 2018\u003c/em\u003e, IEEE, Oct. 2018, pp. 1\u0026ndash;5. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/BTS.2018.8550959\u003c/span\u003e\u003cspan address=\"10.1109/BTS.2018.8550959\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSulema Y (2016) \u0026lsquo;Mulsemedia vs. Multimedia: State of the art and future trends\u0026rsquo;, in \u003cem\u003eInternational Conference on Systems, Signals, and Image Processing\u003c/em\u003e, IEEE, May pp. 1\u0026ndash;5. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/IWSSIP.2016.7502696\u003c/span\u003e\u003cspan address=\"10.1109/IWSSIP.2016.7502696\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTal I et al (2020) \u0026lsquo;Mulsemedia in education: A case study on learner experience, motivation, and knowledge gain\u0026rsquo;, in \u003cem\u003eCSEDU Conference\u003c/em\u003e, pp. 180\u0026ndash;187\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMohana M, Valliammal N, Suvetha V, Krishnaveni M, Subashini P, Ghinea G \u0026lsquo;A Study on Technology-Enhanced Mulsemedia Learning for Enhancing Learner\u0026rsquo;s Experience in E-Learning\u0026rsquo;, (2023) \u003cem\u003eInternational Conference on Network, Multimedia and Information Technology, NMITCON 2023\u003c/em\u003e, 2023. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/NMITCON58196.2023.10275964\u003c/span\u003e\u003cspan address=\"10.1109/NMITCON58196.2023.10275964\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMuntean CH, Tal I, Bogusevschi D, Bratu M, Bi T, Muntean GM (2024) \u0026lsquo;Mulseplayer: A Multi-Sensorial Media Content Delivery Solution to Enhance End-User Quality of Experience\u0026rsquo;, \u003cem\u003eIEEE International Symposium on Broadband Multimedia Systems and Broadcasting, BMSB\u003c/em\u003e, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/BMSB62888.2024.10608351\u003c/span\u003e\u003cspan address=\"10.1109/BMSB62888.2024.10608351\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMelo M et al (2022) Immersive multisensory virtual reality technologies for virtual tourism: A study of the user\u0026rsquo;s sense of presence, satisfaction, emotions, and attitudes. Multimed Syst. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s00530-022-00898-7\u003c/span\u003e\u003cspan address=\"10.1007/s00530-022-00898-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMarfil D, Boronat F, Gonzalez J, Sapena A (2022) Integration of Multisensorial Effects in Synchronised Immersive Hybrid TV Scenarios. IEEE Access 10:79071\u0026ndash;79089. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2022.3194170\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2022.3194170\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCovaci A, Zou L, Tal I, Muntean GM, Ghinea G (2018) \u0026lsquo;Is multimedia multisensorial? - A review of mulsemedia systems\u0026rsquo;, Aug. 01, \u003cem\u003eAssociation for Computing Machinery\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1145/3233774\u003c/span\u003e\u003cspan address=\"10.1145/3233774\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eObrist M, Tuch AN, Hornbaek K (2014) \u0026lsquo;Opportunities for odor: experiences with smell and implications for technology\u0026rsquo;, in \u003cem\u003eCHI \u0026rsquo;14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems\u003c/em\u003e, May\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYuan Z, Bi T, Muntean GM, Ghinea G (2015) \u0026lsquo;Perceived synchronization of mulsemedia services\u0026rsquo;, \u003cem\u003eIEEE Trans Multimedia\u003c/em\u003e, vol. 17, no. 7, pp. 957\u0026ndash;966, Jul. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/TMM.2015.2431915\u003c/span\u003e\u003cspan address=\"10.1109/TMM.2015.2431915\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAdemoye OA, Murray N, Muntean G-M, Ghinea G (Aug. 2016) Audio Masking Effect on Inter-Component Skews in Olfaction-Enhanced Multimedia Presentations. ACM Trans Multimedia Comput Commun Appl 12(4):1\u0026ndash;14. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1145/2957753\u003c/span\u003e\u003cspan address=\"10.1145/2957753\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMurray N, Ademoye OA, Ghinea G, Muntean G-M (2017) \u0026lsquo;A Tutorial for Olfaction-Based Multisensorial Media Application Design and Evaluation\u0026rsquo;, \u003cem\u003eACM Comput Surv\u003c/em\u003e, vol. 50, no. 5, pp. 1\u0026ndash;30, Sep. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1145/3108243\u003c/span\u003e\u003cspan address=\"10.1145/3108243\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMurray N, Muntean GM, Qiao Y, Lee B (2018) Olfaction-enhanced multimedia synchronization. MediaSync: Handbook on Multimedia Synchronization. Springer International Publishing, Cham, pp 319\u0026ndash;356. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/978-3-319-65840-7_12\u003c/span\u003e\u003cspan address=\"10.1007/978-3-319-65840-7_12\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAkyildiz IF, Guo H, Dai R, Gerstacker W (2023) \u0026lsquo;Mulsemedia Communication Research Challenges for Metaverse in 6G Wireless Systems\u0026rsquo;, \u003cem\u003eITU Journal on Future and Evolving Technologies\u003c/em\u003e, vol. 4, no. 4, Dec\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSaleme EB, Covaci A, Mesfin G, Santos CAS, Ghinea G (2019) \u0026lsquo;Mulsemedia DIY: A survey of devices and a tutorial for building your own mulsemedia environment\u0026rsquo;, Jun. 01, \u003cem\u003eAssociation for Computing Machinery\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1145/3319853\u003c/span\u003e\u003cspan address=\"10.1145/3319853\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWaltl M, Rainer B, Hellwagner H (Feb. 2013) An end-to-end tool chain for Sensory Experience based on MPEG-V. Signal Process Image Commun 28(2):136\u0026ndash;150. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/J.IMAGE.2012.10.009\u003c/span\u003e\u003cspan address=\"10.1016/J.IMAGE.2012.10.009\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ede Mattos DP, Muchaluat-Saade DC (2018) \u0026lsquo;STEVE: a Hypermedia Authoring Tool based on the Simple Interactive Multimedia Model\u0026rsquo;, in \u003cem\u003eDocEng \u0026rsquo;18: Proceedings of the ACM Symposium on Document Engineering 2018\u003c/em\u003e, Halifax NS Canada, Aug. pp. 1\u0026ndash;10\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e\u0026lsquo;International standard ISO/IEC 23005-3 (2013) Information technology \u0026mdash; Media context and control \u0026mdash; Part 3: Sensory information\u0026amp;#8217\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e\u0026lsquo;International standard ISO/IEC 23005-3 (2013) Information technology \u0026mdash; Media context and control \u0026mdash; Part 3: Sensory information (MPEG-V Part 3)\u0026amp;#8217\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChoi B, Lee E-S, Yoon K \u0026lsquo;Streaming Media with Sensory Effect\u0026rsquo;, in (2011) \u003cem\u003eInternational Conference on Information Science and Applications\u003c/em\u003e, Jeju, Korea (South), 2011, pp. 1\u0026ndash;6\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSang-Kyun, Kim (2013) Authoring multisensorial content. Signal Process Image Commun 28(2):162\u0026ndash;167\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDanieau F, Bernon J, Fleureau J, Guillotel P, Mollet N, Christie M (2013) \u0026lsquo;H-Studio: An Authoring Tool for Adding Haptic and Motion Effects to Audiovisual Content\u0026rsquo;, in \u003cem\u003eProceedings of the adjunct publication of the 26th annual ACM symposium on User interface software and technology\u003c/em\u003e, pp. 83\u0026ndash;84\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShin SH, Ha KS, Yun HO, Nam YS (2016) \u0026lsquo;Realistic media authoring tool based on MPEG-V international standard\u0026rsquo;, in \u003cem\u003eInternational Conference on Ubiquitous and Future Networks, ICUFN\u003c/em\u003e, IEEE, Jul. pp. 730\u0026ndash;732. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ICUFN.2016.7537133\u003c/span\u003e\u003cspan address=\"10.1109/ICUFN.2016.7537133\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDe Mattos DP, Muchaluat-Saade DC, Ghinea G (2021) \u0026lsquo;Beyond Multimedia Authoring: On the Need for Mulsemedia Authoring Tools\u0026rsquo;, \u003cem\u003eACM Comput Surv\u003c/em\u003e, pp. 1\u0026ndash;31\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDe Mattos DP, Muchaluat-Saade DC, Ghinea G \u0026lsquo;An Approach for Authoring Mulsemedia Documents Based on Events\u0026rsquo;, (2020) \u003cem\u003eInternational Conference on Computing, Networking and Communications, ICNC 2020\u003c/em\u003e, pp. 273\u0026ndash;277, Feb. 2020. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ICNC47757.2020.9049485\u003c/span\u003e\u003cspan address=\"10.1109/ICNC47757.2020.9049485\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVieira R, Ivanov M, Abreu R, dos Santos JAF, Mattos D (2023) and D. C. Muchaluat-Saade, \u0026lsquo;Autoria de Aplica\u0026ccedil;\u0026otilde;es Multissensoriais para TV 3.0 com a Ferramenta STEVE\u0026rsquo;, pp. 143\u0026ndash;149, Oct. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.5753/WEBMEDIA_ESTENDIDO.2023.236124\u003c/span\u003e\u003cspan address=\"10.5753/WEBMEDIA_ESTENDIDO.2023.236124\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ede Abreu RS, Mattos D, Santos Jd, Ghinea G (2021) Muchaluat-Saade, \u0026lsquo;Toward content-driven intelligent authoring of mulsemedia applications\u0026rsquo;. IEEE Multimedia 28(1):7\u0026ndash;16\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKim S-K, Joo Y-S, Lee Y (2013) Sensible Media Simulation in an Automobile Application and Human Responses to Sensory Effects. ETRI J 35(6):1001\u0026ndash;1010\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJosu\u0026eacute; M et al \u0026lsquo;Modeling sensory effects as first-class entities in multimedia applications\u0026rsquo;, \u003cem\u003eProceedings of the 9th ACM Multimedia Systems Conference, MMSys\u003c/em\u003e (2018), pp. 225\u0026ndash;236, 2018, doi:, pp. 225\u0026ndash;236, 2018. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1145/3204949.3204967\u003c/span\u003e\u003cspan address=\"10.1145/3204949.3204967\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Footnotes","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://sensoryacumen.com/\u003c/span\u003e\u003cspan address=\"https://sensoryacumen.com/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (last access: June 2025)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.olorama.com/\u003c/span\u003e\u003cspan address=\"https://www.olorama.com/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (last access: June 2025)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.bhaptics.com/\u003c/span\u003e\u003cspan address=\"https://www.bhaptics.com/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (last access: June 2025)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.portaventuraworld.com\u003c/span\u003e\u003cspan address=\"https://www.portaventuraworld.com\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (last access: June 2025)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.oceanografic.org/actividad/cine-4d/\u003c/span\u003e\u003cspan address=\"https://www.oceanografic.org/actividad/cine-4d/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e or \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.heroncity.com/valencia/heron-city-paterna/4dx\u003c/span\u003e\u003cspan address=\"https://www.heroncity.com/valencia/heron-city-paterna/4dx\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (last access: June 2025)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e The MPEG-V standard defines up to 45 positions around the user\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e The HTML\u0026thinsp;\u0026lt;\u0026thinsp;video\u0026thinsp;\u0026gt;\u0026thinsp;tag is used in the simulator, therefore only the video formats supported by the used browser could be played (e.g., MP4, WebM or OGG).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e The development of the MC device and the communication protocol with it, via WebSocket technology, is beyond the scope of this paper and is left for further work.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e Java Framework for the VLC Media Player. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/caprica/vlcj\u003c/span\u003e\u003cspan address=\"https://github.com/caprica/vlcj\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (last access: June 2025)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e Maxon Cinema 4D: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.maxon.net/cinema-4d\u003c/span\u003e\u003cspan address=\"https://www.maxon.net/cinema-4d\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (last access: June 2025).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e Real4DStudio \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.real4dhub.or.kr\u003c/span\u003e\u003cspan address=\"http://www.real4dhub.or.kr\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (last access: June 2025)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e When selected with the mouse, the drop-down list of the tool will automatically list all contents available in a specific folder on the media server, provided that both the AV content and its associated SEM metadata files (both with the same name but different extension) are available.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.blender.org/\u003c/span\u003e\u003cspan address=\"https://www.blender.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (last access: June 2025)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://threejs.org/\u003c/span\u003e\u003cspan address=\"https://threejs.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (last access: June 2025)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://threejs.org/docs/#api/en/core/Raycaster\u003c/span\u003e\u003cspan address=\"https://threejs.org/docs/#api/en/core/Raycaster\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (last access: June 2025)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e \u003cem\u003eNote for reviewers\u003c/em\u003e: in \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://youtu.be/TVm9APxoE3o\u003c/span\u003e\u003cspan address=\"https://youtu.be/TVm9APxoE3o\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e a preliminary draft of a video about the tool can be watched (apologies because it is in Spanish). If the paper is finally accepted, a better one will be prepared, and its link will be provided here. If accepted, the source files of the tool will be made publicly available.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e Some effect generation devices take time to start the generation of the desired effect, or the effect takes time to be noticed by the users. Therefore, activation messages should be sent in advance (i.e., a few seconds before the user is supposed to notice the effect). As an example, consider a fan device that generates a wind effect, in which the speed of the blades starts from zero (wind intensity of 0) when it is activated and takes a while to reach the desired wind intensity included in the received activation message.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[{"identity":"34e166ce-39c1-48c7-9b1d-6fda57c7d871","identifier":"10.13039/100014440","name":"Ministerio de Ciencia, Innovación y Universidades","awardNumber":"PID2021-126645OB-I00","order_by":0},{"identity":"89d25423-a758-45fa-83b0-320c1add6637","identifier":"10.13039/501100011033","name":"Agencia Estatal de Investigación","awardNumber":"PID2021-126645OB-I00","order_by":1},{"identity":"b0b70d1f-4203-474d-9ca4-1fa8a90535f1","identifier":"10.13039/501100008530","name":"European Regional Development Fund","awardNumber":"PID2021-126645OB-I00, ERDFMICIU/AEI/10.13039/501100011033/ ","order_by":2},{"identity":"7ddff5af-cd90-4871-a081-beff9bae509a","identifier":"10.13039/501100016386","name":"Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital, Generalitat Valenciana","awardNumber":"CIAICO/2022/025","order_by":3},{"identity":"8443ccff-5f8a-4971-8215-aa425acf50c2","identifier":"10.13039/501100003359","name":"Generalitat Valenciana","awardNumber":"ACIF/2021/192","order_by":4}],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Universitat Politècnica de València","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Immersivity, MPEG-V, Sensory Effects, Sensory Experience, Mulsemedia Simulation, Sensory Effect Simulator","lastPublishedDoi":"10.21203/rs.3.rs-7046647/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7046647/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cem\u003eTraditional multimedia systems normally include audio-visual content that only stimulates the senses of sight and hearing. However, stimulating additional senses can provide more immersive and realistic experiences, increasing the users’ Quality of Experience (QoE). For years, the research community has been working on the development of multimedia systems that include sensory effect metadata associated with the audio-visual content and capable of generating these effects, thus stimulating all the users’ senses. Examples of effects are scents (smell), flavours (taste), vibrations, pressure, wind effects (touch), special lighting, temperature, humidity, smoke, sprays (environmental effects), etc. There already exist some related solutions and standards (e.g., MPEG-V) that enable the integration of real sensory effect generation devices into multimedia systems. However, once these integrations are designed, having a complete physical setup with multiple physical devices in different positions around the user to test their performance is costly and allows little flexibility. A faster and cheaper alternative method involves the use of simulators. In this article, an MPEG-V compliant web-based 3D simulator is presented. The user can select audio-visual content, visualise it and check the correct activation/deactivation of each sensory effect during playback, as well as the position from which they are generated, among the 45 positions around the user defined in the standard. Additionally, a communication module with a controller device has been included to be used when it is integrated in a real mulsemedia environment.\u003c/em\u003e\u003c/p\u003e","manuscriptTitle":"MPEG-V compliant 3D simulation tool for multimedia playback with sensory effects","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-07-07 03:43:41","doi":"10.21203/rs.3.rs-7046647/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"90410af3-93d2-4d84-90f5-328c8f7abb3d","owner":[],"postedDate":"July 7th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":51053179,"name":"Theoretical Computer Science"},{"id":51053180,"name":"Computer Architecture and Engineering"},{"id":51053181,"name":"Publishing/Media"},{"id":51053182,"name":"Media Studies"}],"tags":[],"updatedAt":"2025-07-07T03:43:42+00:00","versionOfRecord":[],"versionCreatedAt":"2025-07-07 03:43:41","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7046647","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7046647","identity":"rs-7046647","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00