Full text
6,878 characters
· extracted from
preprint-html
· click to expand
Stream3DGen: Real-time, Streaming, and Highly Controllable Text-to-3D-Scene Generation via Distilled 3D-aware Diffusion Transformers | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 23 February 2026 V1 Latest version Share on Stream3DGen: Real-time, Streaming, and Highly Controllable Text-to-3D-Scene Generation via Distilled 3D-aware Diffusion Transformers Authors : Wenhan Qian 0009-0001-5922-9649 [email protected] and Ziqu Shang Authors Info & Affiliations https://doi.org/10.22541/au.177188129.94512097/v1 149 views 88 downloads Contents Abstract Supplementary Material Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract The rapid advancement of Text-to-3D-Scene Generation (T23DSG) promises to revolutionize digital content creation, yet existing methods grapple with critical limitations: inefficient generation, inadequate long-term temporal consistency, insufficient fine-grained control, and a lack of support for streaming outputs. To overcome these challenges, we introduce Stream3DGen, a novel research framework designed for real-time, streaming, and highly controllable T23DSG. Our approach achieves a crucial balance between high scene fidelity, low latency, and robust long-term 3D geometric and semantic consistency. Stream3DGen's core lies in a revamped 3D-aware Diffusion Transformer architecture, which incorporates an Adaptive Spatial-Temporal Attention module and leverages a Dynamic Scene Field representation for efficient, streamable updates and rendering. The training framework employs Hierarchical Flow Matching alongside a novel Dynamic Scene Buffer with a Predictive Chunking strategy to ensure spatio-temporal coherence. Furthermore, a specialized Multi-stage Neural Scene Field Distillation process compresses the generative model into an highly efficient student, enabling remarkable real-time generation. Evaluated on dynamic prompts from SceneBench and a custom Controllability-TestSet, Stream3DGen achieves superior Scene Fidelity, Temporal Coherence, and Control Adherence compared to state-of-the-art baselines, while maintaining real-time performance. Our contributions include the first real-time streaming T23DSG framework, the innovative 3D-aware Diffusion Transformer with its components, and the efficient multi-stage distillation strategy. Supplementary Material File (stream3dgen.pdf) Download 1.90 MB Information & Authors Information Version history V1 Version 1 23 February 2026 Copyright This work is licensed under a Creative Commons Attribution 4.0 International License Keywords computing and processing control diffusion streaming temporal consistency text-to-3d-scene generation Authors Affiliations Wenhan Qian 0009-0001-5922-9649 [email protected] Shangqiu Normal University View all articles by this author Ziqu Shang Shangqiu Normal University View all articles by this author Metrics & Citations Metrics Article Usage 149 views 88 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Wenhan Qian, Ziqu Shang. Stream3DGen: Real-time, Streaming, and Highly Controllable Text-to-3D-Scene Generation via Distilled 3D-aware Diffusion Transformers. Authorea . 23 February 2026. DOI: https://doi.org/10.22541/au.177188129.94512097/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.177188129.94512097/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9fdfb28edb814807',t:'MTc3OTE1ODAxMw=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.