Integrated photonic 3D tensor processing engine

preprint OA: closed
Full text JSON View at publisher
Full text 124,401 characters · extracted from preprint-html · click to expand
Integrated photonic 3D tensor processing engine | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Integrated photonic 3D tensor processing engine Liangjun Lu, Yue Wu, Ziheng Ni, Xin Li, Yuanxun Wang, Jianping Chen, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5399911/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Optical computing leverages high bandwidth, low latency, and power efficiency, which is considered as one of the most effective solutions for accelerating deep learning tasks. However, mainstream photonic hardware accelerators are primarily optimized for two-dimensional (2D) matrix-vector multiplications (MVMs). To implement three-dimensional (3D) convolutional neural networks (CNNs), high-order tensors must be reshaped, duplicated, and cached in the electrical domain according to the size of the accelerators before computation, leading to extra memory usage and time overheads. Additionally, synchronization across multiple channels depends on external electronic clocks, which increases the complexity of the system. In this work, we propose an integrated photonic 3D tensor processing engine (3D-TPE) based on the interweaving of time, wavelength, and space. Data caching, computation, and synchronization are realized in the optical domain, reducing memory and time usage, and simplifying the system. Optical caching and synchronization are achieved with an optical tunable delay line chip supporting versatile clock frequencies up to 200 GHz, and optical computing is accomplished with a dual-coupled micro-ring resonators (MRRs) based crossbar chip with a 3-dB passband width of 50 GHz. We verify the processing capabilities of the 3D-TPE at clock frequencies ranging from 10 GHz to 30 GHz and perform a proof-of-concept experiment for a LiDAR 3D point cloud image recognition task operating at 20 GHz, achieving a recognition accuracy of 97.06%. The proposed 3D-TPE is anticipated to facilitate high-order tensor convolutions, playing an important role in autonomous driving, healthcare, video analytics, virtual reality, etc. Physical sciences/Optics and photonics/Applied optics/Integrated optics Physical sciences/Optics and photonics/Applied optics/Optoelectronic devices and components Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Deep learning-driven artificial intelligence (AI) has achieved significant breakthroughs in data-intensive tasks [ 1 ] . Convolutional neural networks, which extract data features through layers of convolutional filters, are among the most powerful tools in deep learning [ 2 ] . Propelled by the development of smart sensors [ 3 , 4 ] , information extraction now occurs across time, space, frequency, and other parameter spaces, forming high-order tensors [ 5 ] . While 2D CNNs process data on a 2D plane, 3D CNNs leverage 3D kernels to explore internal relationships within tensors across spatial and temporal dimensions [ 6 ] , playing a crucial role in 3D medical image segmentation [ 7 – 9 ] , video analysis [ 10 , 11 ] , autonomous driving [ 12 , 13 ] , and other fields [ 14 – 16 ] . However, the cubic increase in computation and memory overheads in the 3D CNNs presents significant challenges to hardware processing capabilities. With the exponential growth of AI models, high-speed and energy-efficient hardware accelerators are urgently desired [ 17 , 18 ] . Electronic computing accelerators, such as Graphics Processing Units (GPUs) [ 19 ] , Tensor Processing Units (TPUs) [ 20 ] , memristor crossbar arrays [ 21 ] , and Field-Programmable Gate Arrays (FPGAs) [ 22 ] , have been extensively developed to meet computing power requirements. However, most of these hardware accelerators are optimized for 2D MVMs, which may not be optimal for accelerating 3D CNNs [ 23 ] . On the other hand, joule heating and operation bandwidth limitations inherent in electronic components hinder further improvements in computing speed and efficiency [ 24 ] . Photonic processors mitigate data transfer bandwidth bottlenecks caused by capacitor charging and discharging processes in electronic processors, achieving processing bandwidths exceeding hundreds of gigahertz [ 25 ] , and offer advantages of low latency, minimal power dissipation, and high degrees of modulation freedom across wavelengths, waveguide modes, polarization, time and space [ 26 ] . Therefore, photonic computing accelerators are promising candidates to address the information processing challenges of data-intensive applications. Various photonic tensor processors have been demonstrated. Utilizing singular value decomposition theory, 2D MVM operations based on Mach-Zehnder interferometer (MZI) mesh have been validated for CNN acceleration [ 27 , 28 ] , but the complexity of coherent optical phase tuning remains a significant challenge [ 29 , 30 ] . Photonic tensor processors based on wavelength division multiplexing (WDM) technology mitigate coherent phase errors by encoding data onto individual wavelengths [ 31 – 34 ] , which greatly improves parallel computing speeds. To further enhance parallelism, hybrid modulation combining WDM and radio frequency (RF) continuous wave signals [ 35 ] and partial coherent optical modulation [ 36 ] , have been explored. Additionally, by utilizing optical delay lines or single-mode fibers, data caching has been performed in the optical domain [ 37 – 39 ] . Despite these remarkable achievements in photonic computing acceleration, the aforementioned schemes are primarily optimized for 2D MVMs. When performing 3D tensor convolutions, as illustrated in the “2D MVM” section of Fig. 1a, the 2D MVM hardware accelerators suffer from several limitations: (ⅰ) 3D tensors need to be reshaped (or sliced) in the electrical domain according to the size of accelerators before performing computation in the optical domain, which involves large-scale data replication and reordering, leading to significant memory and time overheads. The bottleneck between memory and computing units is not effectively broken down. (ⅱ) Synchronization between multi-channel high-speed signals relies on external electronic clocks. As the size of photonic accelerators scales, system complexity increases dramatically, along with the increased number of high-speed modulators, digital-to-analog converters (DACs), and analog-to-digital converters (ADCs), leading to increased energy consumption and costs. (ⅲ) Current schemes with optical data caching are based on fixed delay lines or single-mode fibers, which cannot adapt to varying data caching and symbol rate requirements in time-sparse and time-dense application scenarios. In this work, we propose an integrated photonic 3D tensor processing engine (3D-TPE) that combines two optical memory units (OMUs) and an optical computing unit (OCU), as shown in the “3D-TPE” part of Fig. 1a. Based on the interleaving modulation of wavelength, time and space domains, data caching, synchronization and computation are performed simultaneously in the optical domain, eliminating the external memory and time overheads during data reshaping and clock synchronization in the electrical domain. Only one modulator, one ADC, and one DAC are required in the 3D-TPE, significantly reducing the number of high-speed devices by orders of magnitude and also saving cost and energy. The OMU consists of an integrated eight-channel tunable delay line chip with a tuning resolution of 4.93 ps, which promises adaptive clock frequency adjustment for diverse applications with clock frequencies up to ~ 200 GHz. The OCU is a dual-coupled-MRRs crossbar chip, which is employed for the implementation of parallel dot products via optical intensity modulation. Compared to single-MRR-based weighting element (WE) used in other schemes, the dual-coupled-MRRs WE exhibits a flat spectral response and a larger optical bandwidth, minimizing signal distortion at high-speed symbol rates and enhancing resistance to wavelength shifts of the light source. In proof-of-concept experiments, four-channel matrix multiplication operations at clock frequencies ranging from 10 GHz to 30 GHz were demonstrated. A LiDAR 3D point cloud image recognition task was also performed at a symbol rate of 20 Gbaud, achieving a recognition accuracy of 97.06%, which is comparable to digital results. The proposed integrated photonic 3D-TPE demonstrates significant potential for general-purpose 3D tensor convolution operation and 3D CNN acceleration, which is believed to play an important role in autonomous driving, real-time video analysis, 3D medical image processing, and other applications. Results Processing flow of the integrated photonic 3D-TPE Figure 1b illustrates the schematic of the proposed integrated photonic 3D-TPE and its working principle for performing convolution on a 3D point cloud image. A 3D convolutional kernel performs 3D convolutions on the point cloud image within its coverage area, generating a 3D convolutional feature map by sliding the convolutional kernel in each of the three spatial directions. In the optical implementation, input image data is sequentially loaded onto the modulator and replicated across both wavelength and spatial dimensions. After passing through the optical tunable delay line array (marked as OMU), the delayed replicas are weighted by the dual-coupled-MRRs crossbar circuit (marked as OCU), where the weights of the 3D kernel are deployed. The outputs of the OCU are fed into another OMU and converted into photocurrents by photodetectors (PDs). Finally, the outputs of the PDs are summed by an electrical power combiner, and the 3D feature map is obtained by sampling the output waveforms. The mathematical expression and optical implementation of 3D convolution operation are shown in Fig. 2 . A 3D matrix A , with a size of \(\:\left(I,J,K\right)\) , where \(\:I\times\:J\ge\:K\) , is sequentially modulated onto multi-wavelength optical carriers by an intensity modulator with a symbol duration of \(\:\varDelta\:t\) , where the carriers have \(\:I\times\:J\) individual operating wavelengths. After optical intensity modulation, the signals are evenly divided into K paths, accomplishing \(\:I\times\:J\times\:K\) signal replications across both wavelength and spatial dimensions. By controlling the switching states of MZIs in the tunable optical delay line to select different optical paths, the split optical signals are delayed with time intervals of \(\:\varDelta\:t\) , which is equal to the modulated symbol duration, achieving delay times ranging from \(\:0\) to \(\:(K-1)\:\varDelta\:t\) for the K paths. The delayed multi-wavelength optical signals enter the dual-coupled-MRRs crossbar circuit through horizontal bus waveguides and are subsequently combined into vertical bus waveguides through the filtering effect of the MRRs. The dual-coupled-MRRs crossbar circuit is configured with \(\:K\) rows and \(\:\:I\times\:J\) columns, with weight data of a 3D matrix B deployed on it. The dual-coupled-MRRs structure serves as the basic weighting element, capable of wavelength filtering and optical weighting functions simultaneously. Each dual-coupled-MRRs WE only processes one wavelength channel. Additionally, there is no WE working on the same wavelength in any row or column, thus preventing optical amplitude fluctuations from interference among signals of identical wavelength. After being weighted by the WEs, the outputs of the crossbar circuit are fed into another tunable optical delay line array with time intervals of \(\:K\varDelta\:t\) , so that different wavelengths from distinct output paths perform time delays ranging from 0 to \(\:(I\times\:J\times\:\text{K}-1)\varDelta\:t\) with equal time intervals of \(\:\varDelta\:t\) , respectively. PD arrays and an electrical power combiner are used to perform the accumulation. Finally, the 3D convolutional processing results are obtained by sampling the output waveform at a time interval of \(\:(I\times\:J\times\:\text{K})\varDelta\:t\) . Furthermore, by utilizing the spectral repetition properties in various resonance orders of the MRR, the outputs of the crossbar circuit can be combined by an optical power combiner and accumulated by a PD, enabling all-optical 3D convolutions, as described in supplementary note 1 for more details. Dual-coupled-MRRs based crossbar OCU The OCU used in this work is a 4×4 dual-coupled-MRRs photonic crossbar chip, fabricated on a multilayer Si 3 N 4 -on-SOI platform. Figure 3 a shows the microscope image of the fabricated chip with a footprint of \(\:2mm\times\:3.17mm\) . It consists of 16 WEs, each implemented as a 3D dual-coupled-MRRs structure, as depicted in Fig. 3 b. The dual-coupled MRRs are constructed on the middle Si 3 N 4 layer, which are coupled to the bottom silicon waveguide and the top Si 3 N 4 waveguide, respectively, forming an add-drop filter configuration. The lower refractive index contrast (~ 0.56) and lower thermo-optic coefficient of the Si 3 N 4 waveguide make the MRRs less sensitive to fabrication deviations and ambient temperature variations compared to silicon-based MRRs [ 40 ] . This enhances thermal stability and also releases the complexity of weight tuning. Additionally, the bottom silicon waveguide and top Si 3 N 4 waveguide form a 3D waveguide crossing, which can further reduce the crossover loss compared to planar waveguide crossings, thereby minimizing insertion loss differences between weighting channels. More details of the chip design can be found in our previous work [ 41 ] . Figure 3 c presents the measured spectra of four dual-coupled-MRRs WEs in a column with a channel spacing of 97 GHz. The 3D dual-coupled-MRRs WE exhibits a box-like spectral shape, with an optical 3-dB bandwidth of approximately 50 GHz and crosstalk below − 25 dB. Compared with single-MRR-based WEs, the dual-coupled-MRRs WE offers a larger optical bandwidth to mitigate signal distortion in high-speed signal processing, lower crosstalk from adjacent channels, and flatter spectral characteristics to minimize weighting errors. More details about the comparison can be found in supplementary note 2. The weight tuning process is shown in Fig. 3 d. Weight deployment for each dual-coupled WE is accomplished by shifting the resonance wavelength of one MRR while keeping the other one fixed, which simplifies the weight tuning process and improves thermal stability compared to adjusting both MRRs simultaneously. Figure 3 e illustrates the weight-voltage (W-V) relationships of four WEs by monitoring the optical power at the operating wavelengths and normalizing it from 0 to 1. The sweeping voltage ranges for the four WEs are different due to the fact that all WEs have the same design parameters, so it is necessary to modulate the WEs to operating wavelengths before voltage sweeping. During the weight sweeping process, the weights of adjacent WEs are fixed at 0.5. The influence of thermal crosstalk on weighting accuracy was evaluated by measuring the weights of a single dual-coupled-MRRs WE while varying the weights of adjacent WEs. In the experiment, 50 randomly generated weight values were applied to the third dual-coupled-MRRs WE in the column, while three weight values, randomly combined from 0, 0.5, and 1, were assigned to the remaining three WEs. Therefore, 27 weight combinations, each with 50 randomly generated weight values, were measured. Figure 3 f shows a scatterplot of the commanded weights versus measured weights across the total 1350 measurements. The inset illustrates weighting errors, with a mean value of 0.001 and a standard deviation of 0.0041. Furthermore, weighting accuracies across the 27 weight combinations were compared, as shown in Fig. 3 g, demonstrating equivalent weighting accuracies exceeding 7 bits for all combinations. Given that the W-V relationships were calibrated with weights of adjacent WEs set to 0.5, the equivalent weighting accuracy reached approximately 9.7 bits under initial conditions. As the weights of adjacent WEs vary, the impact of thermal crosstalk increases, leading to a predictable decrease in weighting accuracy. The experimental results demonstrate that the 3D dual-coupled-MRRs WEs, fabricated on a multilayer Si 3 N 4 -on-SOI platform, can achieve high-precision arbitrary weight tuning without complicated algorithms or feedback control schemes [ 39 , 42 – 44 ] due to their ambient temperature insensitivity and flat-top spectral characteristics. To further verify the parallel computing capabilities of the dual-coupled-MRRs crossbar chip, a 2D CNN was constructed for the MNIST handwritten digit recognition task. Experimental results indicate that the recognition performance of the optical computing chip is comparable to that of a digital computer. More details about the experiment can be found in supplementary note 3. Tensor processing with tunable clock frequencies The key device enabling data caching, synchronization, and task-adapted clock frequency reconfiguration in the 3D-TPE is the OMU chip. Figure 4 a shows a microscope image of the OMU chip used in this study, which was fabricated on the SOI platform with a chip size of \(\:3.76mm\times\:10.9mm\) . It consists of 8 identical optical tunable delay lines (OTDLs), each comprising 7 cascaded MZI switches connected by two delay waveguides in between. The two delay waveguides are designed with different lengths, resulting in different group delays. The differential group delay of the n th ( n = 1, 2, …,6) delay waveguide pair is 2 n − 1 δt , where δt represents the delay step. By controlling the states of the cascaded MZIs to alter optical paths, the group delay can be digitally tuned from 0 to 63 δt . Given that the modulation rate is inversely proportional to the delay time interval between channels, the rate-adapted photonic computing can be achieved by adjusting delay differences between channels. The test results indicate that the chip achieves a delay resolution of 4.93 ps and thus a maximum delay of 310.59 ps is achieved. Thus, for four-channel parallel matrix multiplication, the clock frequency can be adaptively tuned over an ultra-wide range from 0.8 GHz to 200 GHz. More details regarding the design and testing of the delay line can be found in supplementary note 4. Figure 4 b illustrates the conceptual experimental setup (more details of the experimental setup can be found in “Methods”). Four channel wavelengths were combined via a WDM and fed into the 3D-TPE. Signals were modulated onto optical carriers via a high-speed modulator driven by an AWG and then split into four before entering four OTDLs. The optical tunable delay lines were configured with time intervals of Δt to enable optical caching and channel synchronization. The delayed replicas were then weighted by four dual-coupled-MRRs WEs in a column. The output was monitored by a PD and recorded by an OSC. Variable clock frequencies were realized by varying the symbol rates of the input waveform and the corresponding time interval between optical delay lines. Before performing the calculation, a synchronization calibration between computing channels was taken. Figure 4 c shows the four channel waveforms after synchronization calibration, with equal time intervals of ~ 50 ps. More details of the channel synchronization calibration can be seen in supplementary note 5. Computing accuracies were measured using 120,000 randomly generated data points ranging from 0 to 1 at symbol rates of 10 GBaud, 15 GBaud, 20 GBaud, and 30 GBaud, with fixed weight values of \(\:\left[\text{0.90,0.54,0.61,0.76}\right]\) across four WEs. Figure 4 d presents the output waveforms recorded by a real-time OSC at various symbol rates, which closely align with the theoretical results. Weighting errors between the measured and calculated results were analyzed, with equivalent bits of the standard deviations ranging from 4.1 bits to 4.8 bits, as shown in Fig. 4 e. With the increase of symbol rates, computing accuracies decrease slightly due to increased signal distortion and transmission noise in higher speeds. 3D LiDAR point cloud image recognition In recent years, deep learning for point clouds has received increasing interest owing to its broad applications in autonomous driving, computer vision, virtual reality, and other domains [ 45 ] . To further validate the proposed 3D-TPE, a two-class 3D LiDAR point cloud image recognition experiment for pedestrians and vehicles at a symbol rate of 20 GBaud was performed. The dataset used in this study is the Sydney urban objects dataset [ 47 ] , with cases acquired via commercial LiDAR deployed on a moving vehicle. The dataset contains 26 object classes with an uneven distribution across them. To balance the data for recognition, we grouped the bus, car, truck, and van classes into a vehicle class and retained the original pedestrian class. For all 340 cases, the testing and training subsets were split in a 20/80 ratio. Figure 5 a illustrates the architecture of the designed 3D CNN, which consists of a convolution layer with a 3D kernel, a max pooling layer, and two fully connected layers, similar to the well-known VoxNet [ 13 ] . The network was pre-trained on a digital computer, with the convolutional kernel weights restricted to the positive real data domain. After training, all hyperparameters of the 3D CNN were determined. During image inference, the 3D convolution was executed on the 3D-TPE. 3D point cloud images, with a size of \(\:32\times\:32\times\:32\) , were sequentially mapped to analog signals and sent to a high-speed modulator with a symbol rate of 20 GBaud. The 3D kernel, with a size of \(\:2\times\:2\times\:2\) , was deployed on the \(\:4\times\:4\) dual-coupled-MRRs based crossbar chip. Four WEs in a column of the crossbar chip were reused twice to construct eight weight values. A real-time OSC recorded the two output waveforms, while the time delay and accumulation between them were simulated on a digital computer. Feature map data points were obtained by down sampling the real-time waveforms at a sample rate of \(\:20/8\) =2.5 GSa/s. Figure 5 b shows the comparison between calculated and measured 3D feature maps at various viewing angles. Across different viewing angles of pedestrians and vehicles, the 3D-TPE successfully captures key features, with minor differences attributed primarily to signal distortion and noise interference during high-speed transmission. Figure 5 c illustrates the measured real-time waveform, which closely matches the theoretical one. Figure 5 d presents the confusion matrix of the 3D point cloud recognition task, with numbers in the upper left and lower right indicating correct recognition. The experimental recognition accuracy reaches 97.06%, which is equivalent to that of a digital computer. This experiment demonstrates the potential of the proposed 3D-TPE for 3D tensor convolution applications. Discussion To address the limitations of 2D MVM accelerators for high-order data processing, a new tensor processing engine architecture is proposed. The proposed 3D-TPE reduces the memory and time overheads associated with the data reshaping process in 2D MVM accelerators when performing high-order data, breaking the bottleneck of data reshaping and transferring between electrical memory units and OCUs. By introducing OMUs based on on-chip OTDLs, channel synchronization occurs in the optical domain, eliminating the need for external electronic clocks, reducing system complexity, and facilitating large-scale expansion. Furthermore, the adaptive optical tunable delay line configuration guarantees diverse system clock frequency requirements for a wide range of applications. In the OCU, dual-coupled-MRRs WEs based on the multilayer Si 3 N 4 -on-SOI platform were proposed, which alleviated the bandwidth limitations and ambient temperature sensitivity of Si-based single MRR WEs. Enabled by the OTDLs, modulation rates from 10 Gbaud to 30 Gbaud were verified, resulting in a high throughput of \(\:30\times\:4\times\:4\times\:2=\) 0.96 TOPS. With advancements in high-speed modulators [ 48 ] , large-bandwidth PDs [ 49 ] , and given that OTDLs can support ultra-high clock frequencies of 200 GHz, theoretically high throughput of 200 × 4 × 4 × 2 = 6.4 TOPS can be realized and will be further increased with the size scaling of the OCU and OMU. To further explore the characterization of the 3D-TPE, a power efficiency analysis is taken. The reduced number of modulators, drivers, ADCs, and DACs significantly decreases system complexity as well as power consumption. Additionally, by combining non-volatile phase-change materials (PCMs) with zero static power consumption [ 50 ] , power consumption for weight-state maintenance can be neglected, further reducing overall power consumption. More details about power consumption analysis can be found in supplementary note 6. Currently, the OCU chip, OMU chip, modulator, and PD are all discretely packaged. Based on the advanced photonic monolithic integration technology, all the above devices can be monolithically integrated, significantly reducing architecture size and insertion loss, thus facilitating further improvements in computational density and accuracy. See supplementary note 7 for a detailed analysis of the system loss. Kerr optical frequency combs [ 51 ] , though currently limited in output power, provide a broad spectrum and are compatible with silicon-based fabrication, with the development of on-chip optical amplifiers [ 52 ] , promising a bright future for on-chip laser sources. In conclusion, we have proposed a 3D-TPE that combines OMU and OCU, enabling data caching, computing, and synchronization in the optical domain. The OMU based on a tunable optical delay line chip, supports ultra-high clock frequencies up to ~ 200 GHz and a broad tuning range, meeting the rate requirements of various applications. The OCU based on a dual-coupled-MRRs crossbar circuit, has a larger 3dB optical bandwidth of 50 GHz compared with a single-MRR-based weight bank. Weighting accuracies exceeding 7 bits have been achieved for varying weight combinations of adjacent WEs by using a simple look-up-table method. In the proof-of-concept experiments, the processing capability of the 3D-TPE has been demonstrated at modulation rates from 10 GBaud to 30 GBaud, and a binary classification task of 3D Lidar point cloud images has been performed at a modulation rate of 20 GBaud with a recognition accuracy of 97.06%, which is comparable to that of a digital computer. The proposed integrated photonic 3D-TPE scheme is promising for applications such as autonomous driving, video analysis, medical imaging analysis, and other fields. Methods Details for experimental setup The experimental setup is illustrated in Supplementary Note 8. In the experiment, four-channel continuous wave lasers (OVLINK TSP-1000) with a channel spacing of 100 GHz were combined using a WDM (Oplead 4CH-DWDM) and then transmitted into a high-speed modulator (MXAN-LN-40). The digital signals were converted into analog electrical signals with a symbol duration of \(\:\varDelta\:t\) by an AWG (Keysight M8195A) and subsequently amplified by an electric amplifier (SHF S807) before being applied to the modulator. After passing through an EDFA (Amonics AEDFA-23-B-FA) to compensate for optical insertion loss, the modulated optical carries were equally divided into four channels by a waveshaper (Coherent Waveshaper 4000B) and transmitted into four channels of the OMU chip. A multi-channel voltage source was used to provide voltages to the OMU chip and the OCU chip. The OMU chip was configured with a time interval of \(\:\varDelta\:t\) between channels, while the weight values were deployed on the OCU chip. After being delayed and weighted, the optical carriers were summed and amplified by a high-speed PD with TIA (Finisar XPRV2021). The output waveforms were finally recorded by a real-time oscilloscope (Tektronix DPO75902SX), and results were obtained by down sampling at a sampling interval of \(\:4\varDelta\:t\) . Chip fabrication and packaging The OCU chip was fabricated on the AMF’s multilayer Si 3 N 4 -on-SOI platform. The thicknesses of the BOX and the top Si layers are 3 µm and 220 nm, respectively. The thicknesses of the two Si 3 N 4 layers are both 400 nm. To achieve reduced waveguide transmission loss, the Si 3 N 4 waveguide layers undergo a low-pressure chemical vapor deposition (LPCVD) process, and the measured insertion loss of the single-mode Si 3 N 4 waveguide with a 1-µm-width is less than 0.83 dB/cm. The OMU chip was also fabricated by AMF based on the standard SOI platform with a BOX thickness of 3 µm and a top silicon thickness of 220 nm. In the back end of the line (BEOL), both chips used two aluminum metal layers as redistribution layers (RDLs), and a passivation layer with a thickness of about 4 µm was used to protect the chip surface from external contamination, moisture, and other factors to improve device reliability. Both chips were packaged by SJTU-Pinghu Institute of Intelligent Optoelectronics. All the electrodes were wire bonded to printed circuit boards (PCBs) for electrical control, Fiber arrays were then edge-coupled with the chips and fixed by ultraviolet-cured glue. In order to ensure the temperature stability of the chip, a TEC module was also packaged beneath the chip, allowing temperature control with a resolution of 0.01 ℃. Declarations Data availability The data that support the findings of this study are available from the corresponding author upon request. Code availability The code used in the present work is available from the authors upon request. Author contributions Y.W. and L.L. conceived the research and methods. Y.W. performed the experiment and data processing with assistance from Z.N., Y.X.W and L.L. Z.N. designed the optical tunable delay line chip. X.L. designed the dual-coupled-MRRs crossbar chip. Y.W. wrote the original manuscript. Y.W., L.L. and L.Z. revised the manuscript. J.C provided suggestions and feedbacks during the revisions. L.L. and L.Z. co-supervised the research. Competing interests The authors declare no competing interests. Additional information Supplementary information Correspondence and requests for materials should be addressed to Liangjun Lu or Linjie Zhou. Peer review information Reprints and permissions information Acknowledgements L.L. acknowledges the National Key R&D Program of China (2021YFB2801300). L.Z. acknowledges the funding from the National Natural Science Foundation of China (62090052, 62135010). References Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning.” Nature. vol. 521, no. 7553, pp. 436–444, 2015. DOI: 10.1038/nature14539 . L. Alzubaidi, J. Zhang, A.J. Humaidi, et al., “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions.” Journal of Big Data. vol. 8, no. 1, p. 53, 2021. DOI: 10.1186/s40537-021-00444-8 . N. Ha, K. Xu, G. Ren, A. Mitchell, and J.Z. Ou, “Machine Learning-Enabled Smart Sensor Systems.” Advanced Intelligent Systems. vol. 2, no. 9, p. 2000063, 2020. DOI: 10.1002/aisy.202000063 . Z. Ballard, “Machine learning and computation-enabled intelligent sensor design.” vol. 3, p. 2021. N.D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E.E. Papalexakis, and C. Faloutsos, “Tensor Decomposition for Signal Processing and Machine Learning.” IEEE Transactions on Signal Processing. vol. 65, no. 13, pp. 3551–3582, 2017. DOI: 10.1109/TSP.2017.2690524 . Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects.” IEEE Transactions on Neural Networks and Learning Systems. vol. 33, no. 12, pp. 6999–7019, 2022. DOI: 10.1109/TNNLS.2021.3084827 . K. Kamnitsas, C. Ledig, V.F.J. Newcombe, et al., “Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation.” Medical Image Analysis. vol. 36, pp. 61–78, 2017. DOI: 10.1016/j.media.2016.10.004 . S. Niyas, S.J. Pawan, M. Anand Kumar, and J. Rajan, “Medical image segmentation with 3D convolutional neural networks: A survey.” Neurocomputing . vol. 493, pp. 397–413, 2022. DOI: 10.1016/j.neucom.2022.04.065 . D. Zhao, Y. Liu, H. Yin, and Z. Wang, “An attentive and adaptive 3D CNN for automatic pulmonary nodule detection in CT image.” Expert Systems with Applications. vol. 211, p. 118672, 2023. DOI: 10.1016/j.eswa.2022.118672 . H. Duan, Y. Zhao, K. Chen, D. Lin, and B. Dai, “Revisiting Skeleton-based Action Recognition.” In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . pp. 2959–2968. IEEE , New Orleans, LA, USA (2022)DOI: 10.1109/CVPR52688.2022.00298 . G.Z. De Castro, R.R. Guerra, and F.G. Guimarães, “Automatic translation of sign language with multi-stream 3D CNN and generation of artificial depth maps.” Expert Systems with Applications. vol. 215, p. 119394, 2023. DOI: 10.1016/j.eswa.2022.119394 . Z. Meng, X. Xia, R. Xu, W. Liu, and J. Ma, “HYDRO-3D: Hybrid Object Detection and Tracking for Cooperative Perception Using 3D LiDAR.” IEEE Transactions on Intelligent Vehicles. vol. 8, no. 8, pp. 4069–4080, 2023. DOI: 10.1109/TIV.2023.3282567 . D. Maturana and S. Scherer, “VoxNet: A 3D Convolutional Neural Network for real-time object recognition.” In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . pp. 922–928. IEEE , Hamburg, Germany (2015)DOI: 10.1109/IROS.2015.7353481 . Y.-Y. Hong and R.A. Pula, “Detection and classification of faults in photovoltaic arrays using a 3D convolutional neural network.” Energy . vol. 246, p. 123391, 2022. DOI: 10.1016/j.energy.2022.123391 . M. Faraji, “An integrated 3D CNN-GRU deep learning method for short-term prediction of PM2.5 concentration in urban environment.” Science of the Total Environment. p. 2022.. Y. Li, H. Zhang, and Q. Shen, “Spectral–Spatial Classification of Hyperspectral Imagery with 3D Convolutional Neural Network.” Remote Sensing . vol. 9, no. 1, p. 67, 2017. DOI: 10.3390/rs9010067 . “AI and compute,” https://openai.com/index/ai-and-compute/https://openai.com/index/ai-and-compute/URL : “AI and efficiency,” https://openai.com/index/ai-and-efficiency/https://openai.com/index/ai-and-efficiency/URL : S. Mittal and S. Vaishay, “A survey of techniques for optimizing deep learning on GPUs.” Journal of Systems Architecture. vol. 99, p. 101635, 2019. DOI: 10.1016/j.sysarc.2019.101635 . N.P. Jouppi, C. Young, N. Patil, et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit.” In: Proceedings of the 44th Annual International Symposium on Computer Architecture . pp. 1–12. ACM , Toronto ON Canada (2017)DOI: 10.1145/3079856.3080246 . P. Yao, H. Wu, B. Gao, et al., “Fully hardware-implemented memristor convolutional neural network.” Nature. vol. 577, no. 7792, pp. 641–646, 2020. DOI: 10.1038/s41586-020-1942-4 . A. Shawahna, S.M. Sait, and A. El-Maleh, “FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review.” IEEE Access. vol. 7, pp. 7823–7859, 2019. DOI: 10.1109/ACCESS.2018.2890150 . S. Mittal and Vibhu, “A survey of accelerator architectures for 3D convolution neural networks.” Journal of Systems Architecture. vol. 115, p. 102041, 2021. DOI: 10.1016/j.sysarc.2021.102041 . D.A.B. Miller, “Attojoule Optoelectronics for Low-Energy Information Processing and Communications.” Journal of Lightwave Technology. vol. 35, no. 3, pp. 346–396, 2017. DOI: 10.1109/JLT.2017.2647779 . A. Boes, L. Chang, C. Langrock, et al., “Lithium niobate photonics: Unlocking the electromagnetic spectrum.” Science. vol. 379, no. 6627, p. eabj4396, 2023. DOI: 10.1126/science.abj4396 . Z. Wan, H. Wang, Q. Liu, X. Fu, and Y. Shen, “Ultra-Degree-of-Freedom Structured Light for Ultracapacity Information Carriers.” ACS Photonics. vol. 10, no. 7, pp. 2149–2164, 2023. DOI: 10.1021/acsphotonics.2c01640 . Y. Shen, N.C. Harris, S. Skirlo, et al., “Deep learning with coherent nanophotonic circuits.” Nature Photonics. vol. 11, no. 7, pp. 441–446, 2017. DOI: 10.1038/nphoton.2017.93 . H. Zhang, M. Gu, X.D. Jiang, et al., “An optical neural chip for implementing complex-valued neural network.” Nature Communications. vol. 12, no. 1, p. 457, 2021. DOI: 10.1038/s41467-020-20719-7 . S. Pai, Z. Sun, T.W. Hughes, et al., “Experimentally realized in situ backpropagation for deep learning in photonic neural networks.” Science. vol. 380, no. 6643, pp. 398–404, 2023. DOI: 10.1126/science.ade8450 . Y. Zhan, H. Zhang, H. Lin, et al., “Physics-Aware Analytic‐Gradient Training of Photonic Neural Networks.” Laser & Photonics Reviews. vol. 18, no. 4, p. 2300445, 2024. DOI: 10.1002/lpor.202300445 . J. Feldmann, N. Youngblood, M. Karpov, et al., “Parallel convolutional processing using an integrated photonic tensor core.” Nature. vol. 589, no. 7840, pp. 52–58, 2021. DOI: 10.1038/s41586-020-03070-1 . C. Huang, S. Fujisawa, T.F. De Lima, et al., “A silicon photonic–electronic neural network for fibre nonlinearity compensation.” Nature Electronics. vol. 4, no. 11, pp. 837–844, 2021. DOI: 10.1038/s41928-021-00661-2 . A. Sludds, S. Bandyopadhyay, Z. Chen, et al., “Delocalized photonic deep learning on the internet’s edge.” p. 2022.. W. Zhang, A. Tait, C. Huang, et al., “Broadband physical layer cognitive radio with an integrated photonic processor for blind source separation.” Nature Communications. vol. 14, no. 1, p. 1107, 2023. DOI: 10.1038/s41467-023-36814-4 . B. Dong, “Higher-dimensional processing using a photonic tensor core with continuous-time data.” Nature Photonics. p. B. Dong, F. Brückerhoff-Plückelmann, L. Meyer, et al., “Partial coherence enhances parallelized photonic computing.” Nature. vol. 632, no. 8023, pp. 55–62, 2024. DOI: 10.1038/s41586-024-07590-y . X. Xu, M. Tan, B. Corcoran, et al., “11 TOPS photonic convolutional accelerator for optical neural networks.” Nature. vol. 589, no. 7840, pp. 44–51, 2021. DOI: 10.1038/s41586-020-03063-0 . S. Xu, J. Wang, S. Yi, and W. Zou, “High-order tensor flow processing using integrated photonic circuits.” Nature Communications. vol. 13, no. 1, p. 7970, 2022. DOI: 10.1038/s41467-022-35723-2 . B. Bai, Q. Yang, H. Shu, et al., “Microcomb-based integrated photonic processing unit.” Nature Communications. vol. 14, no. 1, p. 66, 2023. DOI: 10.1038/s41467-022-35506-9 . X. Li, W. Gao, L. Lu, J. Chen, and L. Zhou, “Ultra-low-loss multi-layer 8 × 8 microring optical switch.” Photonics Research. vol. 11, no. 5, p. 712, 2023. DOI: 10.1364/PRJ.479499 . X. Li, L. Lu, Y. Zhou, W. Bao, J. Chen, and L. Zhou, “Low-Loss and Power-Efficient Polarization-Diversity 4 × 4 Microring Switch on a Multi-Layer Si3N4-on-SOI Platform.” Journal of Lightwave Technology. pp. 1–10, 2024. DOI: 10.1109/JLT.2024.3449432 . W. Zhang, C. Huang, H.-T. Peng, et al., “Silicon microring synapses enable photonic deep learning beyond 9-bit precision.” Optica . vol. 9, no. 5, p. 579, 2022. DOI: 10.1364/OPTICA.446100 . X. Liu, W. Zhang, J. Cheng, H. Zhou, and J. Dong, “Single-Monitor Calibration for Multiple Microring Synapses.” ACS Photonics . p. acsphotonics.4c00157, 2024. DOI: 10.1021/acsphotonics.4c00157 . J. Cheng, Z. He, Y. Guo, et al., “Self-calibrating microring synapse with dual-wavelength synchronization.” Photonics Research. vol. 11, no. 2, p. 347, 2023. DOI: 10.1364/PRJ.478370 . Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, and M. Bennamoun, “Deep Learning for 3D Point Clouds: A Survey.” IEEE Transactions on Pattern Analysis and Machine Intelligence. vol. 43, no. 12, pp. 4338–4364, 2021. DOI: 10.1109/TPAMI.2020.3005434 . R. Abbasi, A.K. Bashir, H.J. Alyamani, F. Amin, J. Doh, and J. Chen, “Lidar Point Cloud Compression, Processing and Learning for Autonomous Driving.” IEEE Transactions on Intelligent Transportation Systems. vol. 24, no. 1, pp. 962–979, 2023. DOI: 10.1109/TITS.2022.3167957 . M.D. Deuge, A. Quadros, C. Hung, and B. Douillard, “Unsupervised Feature Learning for Classification of Outdoor 3D Scans.” p. 2013.. C. Han, Z. Zheng, H. Shu, et al., “Slow-light silicon modulator with 110-GHz bandwidth.” Science Advances. vol. 9, no. 42, p. eadi5339, 2023. DOI: 10.1126/sciadv.adi5339 . S.M. Koepfli, M. Baumann, Y. Koyaz, et al., “Metamaterial graphene photodetector with bandwidth exceeding 500 gigahertz.” Science. vol. 380, no. 6650, pp. 1169–1174, 2023. DOI: 10.1126/science.adg8017 . X. Yang, L. Lu, Y. Li, et al., “Non-Volatile Optical Switch Element Enabled by Low‐Loss Phase Change Material.” Advanced Functional Materials. p. 2304601, 2023. DOI: 10.1002/adfm.202304601 . H. Shu, L. Chang, Y. Tao, et al., “Microcomb-driven silicon photonic systems.” Nature. vol. 605, no. 7910, pp. 457–463, 2022. DOI: 10.1038/s41586-022-04579-3 . A. Sobhanan, A. Anthur, S. O’Duill, et al., “Semiconductor optical amplifiers: recent advances and applications.” Advances in Optics and Photonics. vol. 14, no. 3, p. 571, 2022. DOI: 10.1364/AOP.451872 . Additional Declarations There is NO Competing Interest. Supplementary Files SupplementaryforIntegratedPhotonic3DTensorProcessingEngine.docx Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5399911","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":385308900,"identity":"eb21927b-2023-4ec2-9689-efcca134cc38","order_by":0,"name":"Liangjun Lu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA8ElEQVRIiWNgGAWjYBACPgbmhgNAmhmIDjBIgMUS8GthY2CEaWFLYJBIIFILlMljAFVNSItEYuOBnztq2Q1u93z+YPnjMAM/e44Bw88deLU0HOw9c5zZ4M7ZbRISCYcZJHveGDD2nsGv5QBv2zFmgxu52xhAWgxu5BgwM7YRsOUvWEvO4w8gLfbEaDnM21YD0sIAdpiBBCEtPA8bDsu2HWCWvJFmJiGRls4jceZZwcFePFr42ZMPf3zbVpfMdyP58WcJG2s5/vbkjQ9+4tECBYeTQSQzMPZ5QIwDBDUwMNTZgUjGD0QoHQWjYBSMgpEHAE7DUhRkjWDKAAAAAElFTkSuQmCC","orcid":"","institution":"Shanghai Jiao Tong University","correspondingAuthor":true,"prefix":"","firstName":"Liangjun","middleName":"","lastName":"Lu","suffix":""},{"id":385308901,"identity":"7a2197f1-15fd-445a-a272-e93a42308ce3","order_by":1,"name":"Yue Wu","email":"","orcid":"","institution":"Shanghai Jiao Tong University","correspondingAuthor":false,"prefix":"","firstName":"Yue","middleName":"","lastName":"Wu","suffix":""},{"id":385308902,"identity":"8cebfaa1-383e-4768-a52c-53cd3e3d1c42","order_by":2,"name":"Ziheng Ni","email":"","orcid":"","institution":"Shanghai Jiao Tong University","correspondingAuthor":false,"prefix":"","firstName":"Ziheng","middleName":"","lastName":"Ni","suffix":""},{"id":385308903,"identity":"585e896a-3284-4796-9483-67c076e25922","order_by":3,"name":"Xin Li","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Xin","middleName":"","lastName":"Li","suffix":""},{"id":385308904,"identity":"54e4d46f-8dd8-41c5-bfb2-19f84eaeb9c1","order_by":4,"name":"Yuanxun Wang","email":"","orcid":"","institution":"Shanghai Jiao Tong University","correspondingAuthor":false,"prefix":"","firstName":"Yuanxun","middleName":"","lastName":"Wang","suffix":""},{"id":385308905,"identity":"03b4a66a-0dbd-4a4b-847d-a266d1f6b5ab","order_by":5,"name":"Jianping Chen","email":"","orcid":"","institution":"Shanghai Jiaotong University","correspondingAuthor":false,"prefix":"","firstName":"Jianping","middleName":"","lastName":"Chen","suffix":""},{"id":385308906,"identity":"9553046d-b03e-4e10-a2ef-5fd898f87b6c","order_by":6,"name":"Linjie Zhou","email":"","orcid":"https://orcid.org/0000-0002-2792-2959","institution":"Shanghai Jiao Tong University","correspondingAuthor":false,"prefix":"","firstName":"Linjie","middleName":"","lastName":"Zhou","suffix":""}],"badges":[],"createdAt":"2024-11-06 06:10:50","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5399911/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5399911/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":73674047,"identity":"05709300-d955-4ac3-af63-858ce02d599a","added_by":"auto","created_at":"2025-01-13 12:54:22","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":3359274,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eIntegrated 3D photonic processing engine\u003c/strong\u003e.\u003cstrong\u003e a \u003c/strong\u003eWhen performing 3D tensor convolutions, 2D MVM photonic accelerators need to slice the 3D volume data into 2D vectors first and realize data caching, synchronization, and sampling of high-speed signals relying on external EMU, electrical clocks, ADCs, and DACs. In the 3D-TPE, data caching, computation, and synchronization are realized in the optical domain by the OMU and OCU, breaking the bottleneck between the EMU and OCU, and reducing the complexity of the system. The blue (black) boxes and arrow lines indicate that the signal is transmitted in the optical (electrical) domain. \u003cstrong\u003eb \u003c/strong\u003eConceptual schematic of the proposed 3D-TPE. The OMU consists of on-chip tunable delay lines for adaptive clock frequency, while the OCU is a dual-coupled-MRRs crossbar circuit for weighting and wavelength routing. EMU: electrical memory unit; ADC: analog-to-digital converter; DAC: digital-to-analog converter; OCU: optical computing unit; OMU: optical memory unit; MUX: multiplexer; MOD: modulator; OPS: optical power splitter; PD: photodetector; EPC: electrical power combiner.\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-5399911/v1/a5d2ce3ce4eb90882e72436f.png"},{"id":73674046,"identity":"24990242-1ea7-4405-8fef-e42f0f07c2ca","added_by":"auto","created_at":"2025-01-13 12:54:22","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":790840,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eProcessing flow of the 3D-TPE\u003c/strong\u003e.\u003cstrong\u003e \u003c/strong\u003eThe input 3D matrix \u003cem\u003e\u003cstrong\u003eA\u003c/strong\u003e\u003c/em\u003e, with a size of (\u003cem\u003eI,J,K\u003c/em\u003e), is sequentially modulated onto multi-wavelength carries by an intensity modulator. The 3D kernel matrix \u003cem\u003e\u003cstrong\u003eB\u003c/strong\u003e\u003c/em\u003e, with a size of (\u003cem\u003eI,J,K\u003c/em\u003e), is deployed on the dual-coupled-MRRs based crossbar circuit. The 3D convolution operation is accomplished by controlling the time delay intervals between channels as well as the coding process. ODL: optical delay line.\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-5399911/v1/946c6be18056d5c1609d864c.png"},{"id":73674043,"identity":"c16c1969-1f40-4213-a702-10d81ca299fb","added_by":"auto","created_at":"2025-01-13 12:54:19","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":7586731,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003e3D dual-coupled-MRRs crossbar OCU. a \u003c/strong\u003eMicroscope image of the fabricated 4´4 crossbar chip.\u003cstrong\u003e b \u003c/strong\u003eSchematic structure of the 3D dual-coupled-MRRs on a multilayer Si\u003csub\u003e3\u003c/sub\u003eN\u003csub\u003e4\u003c/sub\u003e-on-SOI platform. The transmission spectrum can be tuned by adjusting voltages applied to MRRs. \u003cstrong\u003ec \u003c/strong\u003eMeasured transmission spectra of four WEs. \u003cstrong\u003ed\u003c/strong\u003e Measured transmission spectra under different weights. The weight tuning is accomplished by varying the voltage applied to MRR1 while keeping MRR2 fixed. \u003cstrong\u003ee\u003c/strong\u003e Normalized weight-voltage relationship curves. \u003cstrong\u003ef\u003c/strong\u003e Scatterplot between command weights and measured weights for 27 weight combinations. The inset shows the error distribution with a standard deviation of 0.0041 and a mean error of 0.001. \u003cstrong\u003eg \u003c/strong\u003eWeight accuracies corresponding to each of the 27 weight combinations (shown in the legend), respectively.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-5399911/v1/5d27d7f43d5c02947130716c.png"},{"id":73674041,"identity":"e7cf6852-290e-4fed-95e6-d9100f8b84a1","added_by":"auto","created_at":"2025-01-13 12:54:18","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":5263979,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eOptical tunable delay lines for tunable symbol rates. a\u003c/strong\u003e Microscope image of the fabricated optical tunable delay line chip with a footprint of 3.76 mm X 10.9 mm. \u003cstrong\u003eb \u003c/strong\u003eConceptual experimental setup for 4-channel matrix multiplication.\u003cstrong\u003e c\u003c/strong\u003e A step signal is used to verify delay time intervals of 50 ps across four weighting channels. \u003cstrong\u003ed\u003c/strong\u003e Comparison of measured and theoretical waveforms for four-channel matrix multiplication at different symbol rates.\u003cstrong\u003e e\u003c/strong\u003e Computing accuracies at various symbol rates. AWG: arbitrary waveform generator; PC: personal computer; OSC: oscilloscope.\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-5399911/v1/25bada92772c9a42f87adf69.png"},{"id":73674044,"identity":"4dfdf984-7b70-435c-a29a-60b517957d86","added_by":"auto","created_at":"2025-01-13 12:54:20","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":5838251,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003e3D point cloud recognition with a 3D CNN.\u003c/strong\u003e \u003cstrong\u003ea\u003c/strong\u003e 3D CNN consists of a convolutional layer, a max pooling layer, and two fully connected layers. The convolution is implemented through the 3D-TPE. \u003cstrong\u003eb \u003c/strong\u003eCalculated and measured 3D feature maps at different viewing angles. Colors indicate the intensity values of point clouds, providing information about surface characteristics like reflectivity and material. \u003cstrong\u003ec\u003c/strong\u003e Comparison between the digitally calculated output waveform and the measured waveform acquired by a real-time oscilloscope. \u003cstrong\u003ed\u003c/strong\u003eConfusion matrix of calculated and experimental results, with an accuracy of 97.06%.\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-5399911/v1/1a3e96474294b35252a56a80.png"},{"id":73675302,"identity":"da9c06de-9915-42c5-8157-37beed126497","added_by":"auto","created_at":"2025-01-13 13:02:35","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":23095618,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5399911/v1/fb2efd7f-5b79-4306-927e-5c40e6a48695.pdf"},{"id":73674048,"identity":"7a08bcc3-a389-4318-aa83-ccf54b0b795d","added_by":"auto","created_at":"2025-01-13 12:54:23","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":5175422,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryforIntegratedPhotonic3DTensorProcessingEngine.docx","url":"https://assets-eu.researchsquare.com/files/rs-5399911/v1/39df2223bc38de778e687c8d.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Integrated photonic 3D tensor processing engine","fulltext":[{"header":"Introduction","content":"\u003cp\u003eDeep learning-driven artificial intelligence (AI) has achieved significant breakthroughs in data-intensive tasks\u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]\u003c/sup\u003e. Convolutional neural networks, which extract data features through layers of convolutional filters, are among the most powerful tools in deep learning\u003csup\u003e[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]\u003c/sup\u003e. Propelled by the development of smart sensors\u003csup\u003e[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]\u003c/sup\u003e, information extraction now occurs across time, space, frequency, and other parameter spaces, forming high-order tensors\u003csup\u003e[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]\u003c/sup\u003e. While 2D CNNs process data on a 2D plane, 3D CNNs leverage 3D kernels to explore internal relationships within tensors across spatial and temporal dimensions\u003csup\u003e[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]\u003c/sup\u003e, playing a crucial role in 3D medical image segmentation \u003csup\u003e[\u003cspan additionalcitationids=\"CR8\" citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]\u003c/sup\u003e, video analysis\u003csup\u003e[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]\u003c/sup\u003e, autonomous driving\u003csup\u003e[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/sup\u003e, and other fields\u003csup\u003e[\u003cspan additionalcitationids=\"CR15\" citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]\u003c/sup\u003e. However, the cubic increase in computation and memory overheads in the 3D CNNs presents significant challenges to hardware processing capabilities. With the exponential growth of AI models, high-speed and energy-efficient hardware accelerators are urgently desired\u003csup\u003e[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eElectronic computing accelerators, such as Graphics Processing Units (GPUs)\u003csup\u003e[\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]\u003c/sup\u003e, Tensor Processing Units (TPUs)\u003csup\u003e[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]\u003c/sup\u003e, memristor crossbar arrays\u003csup\u003e[\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]\u003c/sup\u003e, and Field-Programmable Gate Arrays (FPGAs)\u003csup\u003e[\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]\u003c/sup\u003e, have been extensively developed to meet computing power requirements. However, most of these hardware accelerators are optimized for 2D MVMs, which may not be optimal for accelerating 3D CNNs \u003csup\u003e[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]\u003c/sup\u003e. On the other hand, joule heating and operation bandwidth limitations inherent in electronic components hinder further improvements in computing speed and efficiency\u003csup\u003e[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]\u003c/sup\u003e. Photonic processors mitigate data transfer bandwidth bottlenecks caused by capacitor charging and discharging processes in electronic processors, achieving processing bandwidths exceeding hundreds of gigahertz\u003csup\u003e[\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]\u003c/sup\u003e, and offer advantages of low latency, minimal power dissipation, and high degrees of modulation freedom across wavelengths, waveguide modes, polarization, time and space\u003csup\u003e[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]\u003c/sup\u003e. Therefore, photonic computing accelerators are promising candidates to address the information processing challenges of data-intensive applications.\u003c/p\u003e \u003cp\u003eVarious photonic tensor processors have been demonstrated. Utilizing singular value decomposition theory, 2D MVM operations based on Mach-Zehnder interferometer (MZI) mesh have been validated for CNN acceleration\u003csup\u003e[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]\u003c/sup\u003e, but the complexity of coherent optical phase tuning remains a significant challenge\u003csup\u003e[\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]\u003c/sup\u003e. Photonic tensor processors based on wavelength division multiplexing (WDM) technology mitigate coherent phase errors by encoding data onto individual wavelengths\u003csup\u003e[\u003cspan additionalcitationids=\"CR32 CR33\" citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]\u003c/sup\u003e, which greatly improves parallel computing speeds. To further enhance parallelism, hybrid modulation combining WDM and radio frequency (RF) continuous wave signals\u003csup\u003e[\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]\u003c/sup\u003e and partial coherent optical modulation\u003csup\u003e[\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e]\u003c/sup\u003e, have been explored. Additionally, by utilizing optical delay lines or single-mode fibers, data caching has been performed in the optical domain\u003csup\u003e[\u003cspan additionalcitationids=\"CR38\" citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]\u003c/sup\u003e. Despite these remarkable achievements in photonic computing acceleration, the aforementioned schemes are primarily optimized for 2D MVMs. When performing 3D tensor convolutions, as illustrated in the \u0026ldquo;2D MVM\u0026rdquo; section of Fig.\u0026nbsp;1a, the 2D MVM hardware accelerators suffer from several limitations: (ⅰ) 3D tensors need to be reshaped (or sliced) in the electrical domain according to the size of accelerators before performing computation in the optical domain, which involves large-scale data replication and reordering, leading to significant memory and time overheads. The bottleneck between memory and computing units is not effectively broken down. (ⅱ) Synchronization between multi-channel high-speed signals relies on external electronic clocks. As the size of photonic accelerators scales, system complexity increases dramatically, along with the increased number of high-speed modulators, digital-to-analog converters (DACs), and analog-to-digital converters (ADCs), leading to increased energy consumption and costs. (ⅲ) Current schemes with optical data caching are based on fixed delay lines or single-mode fibers, which cannot adapt to varying data caching and symbol rate requirements in time-sparse and time-dense application scenarios.\u003c/p\u003e \u003cp\u003eIn this work, we propose an integrated photonic 3D tensor processing engine (3D-TPE) that combines two optical memory units (OMUs) and an optical computing unit (OCU), as shown in the \u0026ldquo;3D-TPE\u0026rdquo; part of Fig.\u0026nbsp;1a. Based on the interleaving modulation of wavelength, time and space domains, data caching, synchronization and computation are performed simultaneously in the optical domain, eliminating the external memory and time overheads during data reshaping and clock synchronization in the electrical domain. Only one modulator, one ADC, and one DAC are required in the 3D-TPE, significantly reducing the number of high-speed devices by orders of magnitude and also saving cost and energy. The OMU consists of an integrated eight-channel tunable delay line chip with a tuning resolution of 4.93 ps, which promises adaptive clock frequency adjustment for diverse applications with clock frequencies up to ~\u0026thinsp;200 GHz. The OCU is a dual-coupled-MRRs crossbar chip, which is employed for the implementation of parallel dot products via optical intensity modulation. Compared to single-MRR-based weighting element (WE) used in other schemes, the dual-coupled-MRRs WE exhibits a flat spectral response and a larger optical bandwidth, minimizing signal distortion at high-speed symbol rates and enhancing resistance to wavelength shifts of the light source. In proof-of-concept experiments, four-channel matrix multiplication operations at clock frequencies ranging from 10 GHz to 30 GHz were demonstrated. A LiDAR 3D point cloud image recognition task was also performed at a symbol rate of 20 Gbaud, achieving a recognition accuracy of 97.06%, which is comparable to digital results. The proposed integrated photonic 3D-TPE demonstrates significant potential for general-purpose 3D tensor convolution operation and 3D CNN acceleration, which is believed to play an important role in autonomous driving, real-time video analysis, 3D medical image processing, and other applications.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eProcessing flow of the integrated photonic 3D-TPE\u003c/h2\u003e \u003cp\u003eFigure\u0026nbsp;1b illustrates the schematic of the proposed integrated photonic 3D-TPE and its working principle for performing convolution on a 3D point cloud image. A 3D convolutional kernel performs 3D convolutions on the point cloud image within its coverage area, generating a 3D convolutional feature map by sliding the convolutional kernel in each of the three spatial directions. In the optical implementation, input image data is sequentially loaded onto the modulator and replicated across both wavelength and spatial dimensions. After passing through the optical tunable delay line array (marked as OMU), the delayed replicas are weighted by the dual-coupled-MRRs crossbar circuit (marked as OCU), where the weights of the 3D kernel are deployed. The outputs of the OCU are fed into another OMU and converted into photocurrents by photodetectors (PDs). Finally, the outputs of the PDs are summed by an electrical power combiner, and the 3D feature map is obtained by sampling the output waveforms.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe mathematical expression and optical implementation of 3D convolution operation are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e2\u003c/span\u003e. A 3D matrix \u003cb\u003eA\u003c/b\u003e, with a size of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\left(I,J,K\\right)\\)\u003c/span\u003e\u003c/span\u003e, where \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:I\\times\\:J\\ge\\:K\\)\u003c/span\u003e\u003c/span\u003e, is sequentially modulated onto multi-wavelength optical carriers by an intensity modulator with a symbol duration of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\varDelta\\:t\\)\u003c/span\u003e\u003c/span\u003e, where the carriers have \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:I\\times\\:J\\)\u003c/span\u003e\u003c/span\u003e individual operating wavelengths. After optical intensity modulation, the signals are evenly divided into K paths, accomplishing \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:I\\times\\:J\\times\\:K\\)\u003c/span\u003e\u003c/span\u003e signal replications across both wavelength and spatial dimensions. By controlling the switching states of MZIs in the tunable optical delay line to select different optical paths, the split optical signals are delayed with time intervals of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\varDelta\\:t\\)\u003c/span\u003e\u003c/span\u003e, which is equal to the modulated symbol duration, achieving delay times ranging from \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:0\\)\u003c/span\u003e\u003c/span\u003e to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:(K-1)\\:\\varDelta\\:t\\)\u003c/span\u003e\u003c/span\u003e for the K paths. The delayed multi-wavelength optical signals enter the dual-coupled-MRRs crossbar circuit through horizontal bus waveguides and are subsequently combined into vertical bus waveguides through the filtering effect of the MRRs. The dual-coupled-MRRs crossbar circuit is configured with \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:K\\)\u003c/span\u003e\u003c/span\u003e rows and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\:I\\times\\:J\\)\u003c/span\u003e\u003c/span\u003e columns, with weight data of a 3D matrix \u003cb\u003eB\u003c/b\u003e deployed on it. The dual-coupled-MRRs structure serves as the basic weighting element, capable of wavelength filtering and optical weighting functions simultaneously. Each dual-coupled-MRRs WE only processes one wavelength channel. Additionally, there is no WE working on the same wavelength in any row or column, thus preventing optical amplitude fluctuations from interference among signals of identical wavelength. After being weighted by the WEs, the outputs of the crossbar circuit are fed into another tunable optical delay line array with time intervals of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:K\\varDelta\\:t\\)\u003c/span\u003e\u003c/span\u003e, so that different wavelengths from distinct output paths perform time delays ranging from 0 to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:(I\\times\\:J\\times\\:\\text{K}-1)\\varDelta\\:t\\)\u003c/span\u003e\u003c/span\u003e with equal time intervals of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\varDelta\\:t\\)\u003c/span\u003e\u003c/span\u003e, respectively. PD arrays and an electrical power combiner are used to perform the accumulation. Finally, the 3D convolutional processing results are obtained by sampling the output waveform at a time interval of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:(I\\times\\:J\\times\\:\\text{K})\\varDelta\\:t\\)\u003c/span\u003e\u003c/span\u003e. Furthermore, by utilizing the spectral repetition properties in various resonance orders of the MRR, the outputs of the crossbar circuit can be combined by an optical power combiner and accumulated by a PD, enabling all-optical 3D convolutions, as described in supplementary note 1 for more details.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eDual-coupled-MRRs based crossbar OCU\u003c/h3\u003e\n\u003cp\u003eThe OCU used in this work is a 4\u0026times;4 dual-coupled-MRRs photonic crossbar chip, fabricated on a multilayer Si\u003csub\u003e3\u003c/sub\u003eN\u003csub\u003e4\u003c/sub\u003e-on-SOI platform. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003ea shows the microscope image of the fabricated chip with a footprint of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:2mm\\times\\:3.17mm\\)\u003c/span\u003e\u003c/span\u003e. It consists of 16 WEs, each implemented as a 3D dual-coupled-MRRs structure, as depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003eb. The dual-coupled MRRs are constructed on the middle Si\u003csub\u003e3\u003c/sub\u003eN\u003csub\u003e4\u003c/sub\u003e layer, which are coupled to the bottom silicon waveguide and the top Si\u003csub\u003e3\u003c/sub\u003eN\u003csub\u003e4\u003c/sub\u003e waveguide, respectively, forming an add-drop filter configuration. The lower refractive index contrast (~\u0026thinsp;0.56) and lower thermo-optic coefficient of the Si\u003csub\u003e3\u003c/sub\u003eN\u003csub\u003e4\u003c/sub\u003e waveguide make the MRRs less sensitive to fabrication deviations and ambient temperature variations compared to silicon-based MRRs\u003csup\u003e[\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]\u003c/sup\u003e. This enhances thermal stability and also releases the complexity of weight tuning. Additionally, the bottom silicon waveguide and top Si\u003csub\u003e3\u003c/sub\u003eN\u003csub\u003e4\u003c/sub\u003e waveguide form a 3D waveguide crossing, which can further reduce the crossover loss compared to planar waveguide crossings, thereby minimizing insertion loss differences between weighting channels. More details of the chip design can be found in our previous work\u003csup\u003e[\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]\u003c/sup\u003e. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003ec presents the measured spectra of four dual-coupled-MRRs WEs in a column with a channel spacing of 97 GHz. The 3D dual-coupled-MRRs WE exhibits a box-like spectral shape, with an optical 3-dB bandwidth of approximately 50 GHz and crosstalk below \u0026minus;\u0026thinsp;25 dB. Compared with single-MRR-based WEs, the dual-coupled-MRRs WE offers a larger optical bandwidth to mitigate signal distortion in high-speed signal processing, lower crosstalk from adjacent channels, and flatter spectral characteristics to minimize weighting errors. More details about the comparison can be found in supplementary note 2. The weight tuning process is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003ed. Weight deployment for each dual-coupled WE is accomplished by shifting the resonance wavelength of one MRR while keeping the other one fixed, which simplifies the weight tuning process and improves thermal stability compared to adjusting both MRRs simultaneously. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003ee illustrates the weight-voltage (W-V) relationships of four WEs by monitoring the optical power at the operating wavelengths and normalizing it from 0 to 1. The sweeping voltage ranges for the four WEs are different due to the fact that all WEs have the same design parameters, so it is necessary to modulate the WEs to operating wavelengths before voltage sweeping. During the weight sweeping process, the weights of adjacent WEs are fixed at 0.5.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe influence of thermal crosstalk on weighting accuracy was evaluated by measuring the weights of a single dual-coupled-MRRs WE while varying the weights of adjacent WEs. In the experiment, 50 randomly generated weight values were applied to the third dual-coupled-MRRs WE in the column, while three weight values, randomly combined from 0, 0.5, and 1, were assigned to the remaining three WEs. Therefore, 27 weight combinations, each with 50 randomly generated weight values, were measured. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003ef shows a scatterplot of the commanded weights versus measured weights across the total 1350 measurements. The inset illustrates weighting errors, with a mean value of 0.001 and a standard deviation of 0.0041. Furthermore, weighting accuracies across the 27 weight combinations were compared, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003eg, demonstrating equivalent weighting accuracies exceeding 7 bits for all combinations. Given that the W-V relationships were calibrated with weights of adjacent WEs set to 0.5, the equivalent weighting accuracy reached approximately 9.7 bits under initial conditions. As the weights of adjacent WEs vary, the impact of thermal crosstalk increases, leading to a predictable decrease in weighting accuracy. The experimental results demonstrate that the 3D dual-coupled-MRRs WEs, fabricated on a multilayer Si\u003csub\u003e3\u003c/sub\u003eN\u003csub\u003e4\u003c/sub\u003e-on-SOI platform, can achieve high-precision arbitrary weight tuning without complicated algorithms or feedback control schemes\u003csup\u003e[\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e, \u003cspan additionalcitationids=\"CR43\" citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]\u003c/sup\u003e due to their ambient temperature insensitivity and flat-top spectral characteristics.\u003c/p\u003e \u003cp\u003eTo further verify the parallel computing capabilities of the dual-coupled-MRRs crossbar chip, a 2D CNN was constructed for the MNIST handwritten digit recognition task. Experimental results indicate that the recognition performance of the optical computing chip is comparable to that of a digital computer. More details about the experiment can be found in supplementary note 3.\u003c/p\u003e\n\u003ch3\u003eTensor processing with tunable clock frequencies\u003c/h3\u003e\n\u003cp\u003eThe key device enabling data caching, synchronization, and task-adapted clock frequency reconfiguration in the 3D-TPE is the OMU chip. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003ea shows a microscope image of the OMU chip used in this study, which was fabricated on the SOI platform with a chip size of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:3.76mm\\times\\:10.9mm\\)\u003c/span\u003e\u003c/span\u003e. It consists of 8 identical optical tunable delay lines (OTDLs), each comprising 7 cascaded MZI switches connected by two delay waveguides in between. The two delay waveguides are designed with different lengths, resulting in different group delays. The differential group delay of the \u003cem\u003en\u003c/em\u003e\u003csup\u003eth\u003c/sup\u003e (\u003cem\u003en\u003c/em\u003e\u0026thinsp;=\u0026thinsp;1, 2, \u0026hellip;,6) delay waveguide pair is 2\u003csup\u003e\u003cem\u003en\u003c/em\u003e\u0026thinsp;\u0026minus;\u0026thinsp;1\u003c/sup\u003e\u003cem\u003eδt\u003c/em\u003e, where \u003cem\u003eδt\u003c/em\u003e represents the delay step. By controlling the states of the cascaded MZIs to alter optical paths, the group delay can be digitally tuned from 0 to 63\u003cem\u003eδt\u003c/em\u003e. Given that the modulation rate is inversely proportional to the delay time interval between channels, the rate-adapted photonic computing can be achieved by adjusting delay differences between channels. The test results indicate that the chip achieves a delay resolution of 4.93 ps and thus a maximum delay of 310.59 ps is achieved. Thus, for four-channel parallel matrix multiplication, the clock frequency can be adaptively tuned over an ultra-wide range from 0.8 GHz to 200 GHz. More details regarding the design and testing of the delay line can be found in supplementary note 4.\u003c/p\u003e \u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eb illustrates the conceptual experimental setup (more details of the experimental setup can be found in \u0026ldquo;Methods\u0026rdquo;). Four channel wavelengths were combined via a WDM and fed into the 3D-TPE. Signals were modulated onto optical carriers via a high-speed modulator driven by an AWG and then split into four before entering four OTDLs. The optical tunable delay lines were configured with time intervals of Δt to enable optical caching and channel synchronization. The delayed replicas were then weighted by four dual-coupled-MRRs WEs in a column. The output was monitored by a PD and recorded by an OSC. Variable clock frequencies were realized by varying the symbol rates of the input waveform and the corresponding time interval between optical delay lines. Before performing the calculation, a synchronization calibration between computing channels was taken. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003ec shows the four channel waveforms after synchronization calibration, with equal time intervals of ~\u0026thinsp;50 ps. More details of the channel synchronization calibration can be seen in supplementary note 5.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eComputing accuracies were measured using 120,000 randomly generated data points ranging from 0 to 1 at symbol rates of 10 GBaud, 15 GBaud, 20 GBaud, and 30 GBaud, with fixed weight values of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\left[\\text{0.90,0.54,0.61,0.76}\\right]\\)\u003c/span\u003e\u003c/span\u003e across four WEs. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003ed presents the output waveforms recorded by a real-time OSC at various symbol rates, which closely align with the theoretical results. Weighting errors between the measured and calculated results were analyzed, with equivalent bits of the standard deviations ranging from 4.1 bits to 4.8 bits, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003ee. With the increase of symbol rates, computing accuracies decrease slightly due to increased signal distortion and transmission noise in higher speeds.\u003c/p\u003e\n\u003ch3\u003e3D LiDAR point cloud image recognition\u003c/h3\u003e\n\u003cp\u003e \u003c/p\u003e \u003cp\u003eIn recent years, deep learning for point clouds has received increasing interest owing to its broad applications in autonomous driving, computer vision, virtual reality, and other domains\u003csup\u003e[\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e]\u003c/sup\u003e. To further validate the proposed 3D-TPE, a two-class 3D LiDAR point cloud image recognition experiment for pedestrians and vehicles at a symbol rate of 20 GBaud was performed. The dataset used in this study is the Sydney urban objects dataset\u003csup\u003e[\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e]\u003c/sup\u003e, with cases acquired via commercial LiDAR deployed on a moving vehicle. The dataset contains 26 object classes with an uneven distribution across them. To balance the data for recognition, we grouped the bus, car, truck, and van classes into a vehicle class and retained the original pedestrian class. For all 340 cases, the testing and training subsets were split in a 20/80 ratio.\u003c/p\u003e \u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003ea illustrates the architecture of the designed 3D CNN, which consists of a convolution layer with a 3D kernel, a max pooling layer, and two fully connected layers, similar to the well-known VoxNet\u003csup\u003e[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/sup\u003e. The network was pre-trained on a digital computer, with the convolutional kernel weights restricted to the positive real data domain. After training, all hyperparameters of the 3D CNN were determined. During image inference, the 3D convolution was executed on the 3D-TPE. 3D point cloud images, with a size of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:32\\times\\:32\\times\\:32\\)\u003c/span\u003e\u003c/span\u003e, were sequentially mapped to analog signals and sent to a high-speed modulator with a symbol rate of 20 GBaud. The 3D kernel, with a size of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:2\\times\\:2\\times\\:2\\)\u003c/span\u003e\u003c/span\u003e, was deployed on the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:4\\times\\:4\\)\u003c/span\u003e\u003c/span\u003e dual-coupled-MRRs based crossbar chip. Four WEs in a column of the crossbar chip were reused twice to construct eight weight values. A real-time OSC recorded the two output waveforms, while the time delay and accumulation between them were simulated on a digital computer. Feature map data points were obtained by down sampling the real-time waveforms at a sample rate of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:20/8\\)\u003c/span\u003e\u003c/span\u003e=2.5 GSa/s. Figure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003eb shows the comparison between calculated and measured 3D feature maps at various viewing angles. Across different viewing angles of pedestrians and vehicles, the 3D-TPE successfully captures key features, with minor differences attributed primarily to signal distortion and noise interference during high-speed transmission. Figure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003ec illustrates the measured real-time waveform, which closely matches the theoretical one. Figure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003ed presents the confusion matrix of the 3D point cloud recognition task, with numbers in the upper left and lower right indicating correct recognition. The experimental recognition accuracy reaches 97.06%, which is equivalent to that of a digital computer. This experiment demonstrates the potential of the proposed 3D-TPE for 3D tensor convolution applications.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eTo address the limitations of 2D MVM accelerators for high-order data processing, a new tensor processing engine architecture is proposed. The proposed 3D-TPE reduces the memory and time overheads associated with the data reshaping process in 2D MVM accelerators when performing high-order data, breaking the bottleneck of data reshaping and transferring between electrical memory units and OCUs. By introducing OMUs based on on-chip OTDLs, channel synchronization occurs in the optical domain, eliminating the need for external electronic clocks, reducing system complexity, and facilitating large-scale expansion. Furthermore, the adaptive optical tunable delay line configuration guarantees diverse system clock frequency requirements for a wide range of applications. In the OCU, dual-coupled-MRRs WEs based on the multilayer Si\u003csub\u003e3\u003c/sub\u003eN\u003csub\u003e4\u003c/sub\u003e-on-SOI platform were proposed, which alleviated the bandwidth limitations and ambient temperature sensitivity of Si-based single MRR WEs.\u003c/p\u003e \u003cp\u003eEnabled by the OTDLs, modulation rates from 10 Gbaud to 30 Gbaud were verified, resulting in a high throughput of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:30\\times\\:4\\times\\:4\\times\\:2=\\)\u003c/span\u003e\u003c/span\u003e0.96 TOPS. With advancements in high-speed modulators\u003csup\u003e[\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e]\u003c/sup\u003e, large-bandwidth PDs\u003csup\u003e[\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e]\u003c/sup\u003e, and given that OTDLs can support ultra-high clock frequencies of 200 GHz, theoretically high throughput of 200 × 4 × 4 × 2 = 6.4 TOPS can be realized and will be further increased with the size scaling of the OCU and OMU. To further explore the characterization of the 3D-TPE, a power efficiency analysis is taken. The reduced number of modulators, drivers, ADCs, and DACs significantly decreases system complexity as well as power consumption. Additionally, by combining non-volatile phase-change materials (PCMs) with zero static power consumption \u003csup\u003e[\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]\u003c/sup\u003e, power consumption for weight-state maintenance can be neglected, further reducing overall power consumption. More details about power consumption analysis can be found in supplementary note 6. Currently, the OCU chip, OMU chip, modulator, and PD are all discretely packaged. Based on the advanced photonic monolithic integration technology, all the above devices can be monolithically integrated, significantly reducing architecture size and insertion loss, thus facilitating further improvements in computational density and accuracy. See supplementary note 7 for a detailed analysis of the system loss. Kerr optical frequency combs\u003csup\u003e[\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e]\u003c/sup\u003e, though currently limited in output power, provide a broad spectrum and are compatible with silicon-based fabrication, with the development of on-chip optical amplifiers\u003csup\u003e[\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]\u003c/sup\u003e, promising a bright future for on-chip laser sources.\u003c/p\u003e \u003cp\u003eIn conclusion, we have proposed a 3D-TPE that combines OMU and OCU, enabling data caching, computing, and synchronization in the optical domain. The OMU based on a tunable optical delay line chip, supports ultra-high clock frequencies up to ~ 200 GHz and a broad tuning range, meeting the rate requirements of various applications. The OCU based on a dual-coupled-MRRs crossbar circuit, has a larger 3dB optical bandwidth of 50 GHz compared with a single-MRR-based weight bank. Weighting accuracies exceeding 7 bits have been achieved for varying weight combinations of adjacent WEs by using a simple look-up-table method. In the proof-of-concept experiments, the processing capability of the 3D-TPE has been demonstrated at modulation rates from 10 GBaud to 30 GBaud, and a binary classification task of 3D Lidar point cloud images has been performed at a modulation rate of 20 GBaud with a recognition accuracy of 97.06%, which is comparable to that of a digital computer. The proposed integrated photonic 3D-TPE scheme is promising for applications such as autonomous driving, video analysis, medical imaging analysis, and other fields.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003cdiv id=\"Sec9\" class=\"Section3\"\u003e \u003c/div\u003e \u003c/div\u003e\n\n "},{"header":"Methods","content":"\u003ch2\u003eDetails for experimental setup\u003c/h2\u003e\u003cp\u003eThe experimental setup is illustrated in Supplementary Note 8. In the experiment, four-channel continuous wave lasers (OVLINK TSP-1000) with a channel spacing of 100 GHz were combined using a WDM (Oplead 4CH-DWDM) and then transmitted into a high-speed modulator (MXAN-LN-40). The digital signals were converted into analog electrical signals with a symbol duration of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\varDelta\\:t\\)\u003c/span\u003e\u003c/span\u003e by an AWG (Keysight M8195A) and subsequently amplified by an electric amplifier (SHF S807) before being applied to the modulator. After passing through an EDFA (Amonics AEDFA-23-B-FA) to compensate for optical insertion loss, the modulated optical carries were equally divided into four channels by a waveshaper (Coherent Waveshaper 4000B) and transmitted into four channels of the OMU chip. A multi-channel voltage source was used to provide voltages to the OMU chip and the OCU chip. The OMU chip was configured with a time interval of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\varDelta\\:t\\)\u003c/span\u003e\u003c/span\u003e between channels, while the weight values were deployed on the OCU chip. After being delayed and weighted, the optical carriers were summed and amplified by a high-speed PD with TIA (Finisar XPRV2021). The output waveforms were finally recorded by a real-time oscilloscope (Tektronix DPO75902SX), and results were obtained by down sampling at a sampling interval of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:4\\varDelta\\:t\\)\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e\u003ch3\u003eChip fabrication and packaging\u003c/h3\u003e\u003cp\u003eThe OCU chip was fabricated on the AMF’s multilayer Si\u003csub\u003e3\u003c/sub\u003eN\u003csub\u003e4\u003c/sub\u003e-on-SOI platform. The thicknesses of the BOX and the top Si layers are 3 µm and 220 nm, respectively. The thicknesses of the two Si\u003csub\u003e3\u003c/sub\u003eN\u003csub\u003e4\u003c/sub\u003e layers are both 400 nm. To achieve reduced waveguide transmission loss, the Si\u003csub\u003e3\u003c/sub\u003eN\u003csub\u003e4\u003c/sub\u003e waveguide layers undergo a low-pressure chemical vapor deposition (LPCVD) process, and the measured insertion loss of the single-mode Si\u003csub\u003e3\u003c/sub\u003eN\u003csub\u003e4\u003c/sub\u003e waveguide with a 1-µm-width is less than 0.83 dB/cm. The OMU chip was also fabricated by AMF based on the standard SOI platform with a BOX thickness of 3 µm and a top silicon thickness of 220 nm. In the back end of the line (BEOL), both chips used two aluminum metal layers as redistribution layers (RDLs), and a passivation layer with a thickness of about 4 µm was used to protect the chip surface from external contamination, moisture, and other factors to improve device reliability. Both chips were packaged by SJTU-Pinghu Institute of Intelligent Optoelectronics. All the electrodes were wire bonded to printed circuit boards (PCBs) for electrical control, Fiber arrays were then edge-coupled with the chips and fixed by ultraviolet-cured glue. In order to ensure the temperature stability of the chip, a TEC module was also packaged beneath the chip, allowing temperature control with a resolution of 0.01 ℃.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch3\u003eData availability\u0026nbsp;\u003c/h3\u003e\n\u003cp\u003eThe data that support the findings of this study are available from the corresponding author upon request.\u0026nbsp;\u003c/p\u003e\n\u003ch3\u003eCode availability\u0026nbsp;\u003c/h3\u003e\n\u003cp\u003eThe code used in the present work is available from the authors upon request. \u003c/p\u003e\n\n\u003ch3\u003eAuthor contributions\u003c/h3\u003e\n\u003cp\u003eY.W. and L.L. conceived the research and methods. Y.W. performed the experiment and data processing with assistance from Z.N., Y.X.W and L.L. Z.N. designed the optical tunable delay line chip. X.L. designed the dual-coupled-MRRs crossbar chip. Y.W. wrote the original manuscript. Y.W., L.L. and L.Z. revised the manuscript. J.C provided suggestions and feedbacks during the revisions. L.L. and L.Z. co-supervised the research.\u003c/p\u003e\n\u003ch3\u003eCompeting interests\u003c/h3\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e\n\u003ch3\u003eAdditional information\u003c/h3\u003e\n\u003cp\u003e\u003cstrong\u003eSupplementary information\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCorrespondence and requests for materials\u003c/strong\u003e should be addressed to Liangjun Lu or Linjie Zhou.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePeer review information\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eReprints and permissions information\u003c/strong\u003e\u003c/p\u003e\u003ch2\u003eAcknowledgements\u003c/h2\u003e \u003cp\u003eL.L. acknowledges the National Key R\u0026amp;D Program of China (2021YFB2801300). L.Z. acknowledges the funding from the National Natural Science Foundation of China (62090052, 62135010).\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eY. LeCun, Y. Bengio, and G. Hinton, \u0026ldquo;Deep learning.\u0026rdquo; Nature. vol. 521, no. 7553, pp. 436\u0026ndash;444, 2015. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/nature14539\u003c/span\u003e\u003cspan address=\"10.1038/nature14539\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eL. Alzubaidi, J. Zhang, A.J. Humaidi, et al., \u0026ldquo;Review of deep learning: concepts, CNN architectures, challenges, applications, future directions.\u0026rdquo; Journal of Big Data. vol. 8, no. 1, p. 53, 2021. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s40537-021-00444-8\u003c/span\u003e\u003cspan address=\"10.1186/s40537-021-00444-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eN. Ha, K. Xu, G. Ren, A. Mitchell, and J.Z. Ou, \u0026ldquo;Machine Learning-Enabled Smart Sensor Systems.\u0026rdquo; Advanced Intelligent Systems. vol. 2, no. 9, p. 2000063, 2020. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/aisy.202000063\u003c/span\u003e\u003cspan address=\"10.1002/aisy.202000063\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZ. Ballard, \u0026ldquo;Machine learning and computation-enabled intelligent sensor design.\u0026rdquo; vol. 3, p. 2021.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eN.D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E.E. Papalexakis, and C. Faloutsos, \u0026ldquo;Tensor Decomposition for Signal Processing and Machine Learning.\u0026rdquo; IEEE Transactions on Signal Processing. vol. 65, no. 13, pp. 3551\u0026ndash;3582, 2017. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/TSP.2017.2690524\u003c/span\u003e\u003cspan address=\"10.1109/TSP.2017.2690524\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZ. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, \u0026ldquo;A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects.\u0026rdquo; IEEE Transactions on Neural Networks and Learning Systems. vol. 33, no. 12, pp. 6999\u0026ndash;7019, 2022. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/TNNLS.2021.3084827\u003c/span\u003e\u003cspan address=\"10.1109/TNNLS.2021.3084827\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eK. Kamnitsas, C. Ledig, V.F.J. Newcombe, et al., \u0026ldquo;Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation.\u0026rdquo; Medical Image Analysis. vol. 36, pp. 61\u0026ndash;78, 2017. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.media.2016.10.004\u003c/span\u003e\u003cspan address=\"10.1016/j.media.2016.10.004\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eS. Niyas, S.J. Pawan, M. Anand Kumar, and J. Rajan, \u0026ldquo;Medical image segmentation with 3D convolutional neural networks: A survey.\u0026rdquo; \u003cem\u003eNeurocomputing\u003c/em\u003e. vol. 493, pp. 397\u0026ndash;413, 2022. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.neucom.2022.04.065\u003c/span\u003e\u003cspan address=\"10.1016/j.neucom.2022.04.065\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eD. Zhao, Y. Liu, H. Yin, and Z. Wang, \u0026ldquo;An attentive and adaptive 3D CNN for automatic pulmonary nodule detection in CT image.\u0026rdquo; Expert Systems with Applications. vol. 211, p. 118672, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.eswa.2022.118672\u003c/span\u003e\u003cspan address=\"10.1016/j.eswa.2022.118672\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eH. Duan, Y. Zhao, K. Chen, D. Lin, and B. Dai, \u0026ldquo;Revisiting Skeleton-based Action Recognition.\u0026rdquo; In: \u003cem\u003e2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\u003c/em\u003e. pp. 2959\u0026ndash;2968. \u003cem\u003eIEEE\u003c/em\u003e, New Orleans, LA, USA (2022)DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/CVPR52688.2022.00298\u003c/span\u003e\u003cspan address=\"10.1109/CVPR52688.2022.00298\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eG.Z. De Castro, R.R. Guerra, and F.G. Guimar\u0026atilde;es, \u0026ldquo;Automatic translation of sign language with multi-stream 3D CNN and generation of artificial depth maps.\u0026rdquo; Expert Systems with Applications. vol. 215, p. 119394, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.eswa.2022.119394\u003c/span\u003e\u003cspan address=\"10.1016/j.eswa.2022.119394\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZ. Meng, X. Xia, R. Xu, W. Liu, and J. Ma, \u0026ldquo;HYDRO-3D: Hybrid Object Detection and Tracking for Cooperative Perception Using 3D LiDAR.\u0026rdquo; IEEE Transactions on Intelligent Vehicles. vol. 8, no. 8, pp. 4069\u0026ndash;4080, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/TIV.2023.3282567\u003c/span\u003e\u003cspan address=\"10.1109/TIV.2023.3282567\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eD. Maturana and S. Scherer, \u0026ldquo;VoxNet: A 3D Convolutional Neural Network for real-time object recognition.\u0026rdquo; In: 2015 \u003cem\u003eIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)\u003c/em\u003e. pp. 922\u0026ndash;928. \u003cem\u003eIEEE\u003c/em\u003e, Hamburg, Germany (2015)DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/IROS.2015.7353481\u003c/span\u003e\u003cspan address=\"10.1109/IROS.2015.7353481\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eY.-Y. Hong and R.A. Pula, \u0026ldquo;Detection and classification of faults in photovoltaic arrays using a 3D convolutional neural network.\u0026rdquo; \u003cem\u003eEnergy\u003c/em\u003e. vol. 246, p. 123391, 2022. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.energy.2022.123391\u003c/span\u003e\u003cspan address=\"10.1016/j.energy.2022.123391\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eM. Faraji, \u0026ldquo;An integrated 3D CNN-GRU deep learning method for short-term prediction of PM2.5 concentration in urban environment.\u0026rdquo; Science of the Total Environment. p. 2022..\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eY. Li, H. Zhang, and Q. Shen, \u0026ldquo;Spectral\u0026ndash;Spatial Classification of Hyperspectral Imagery with 3D Convolutional Neural Network.\u0026rdquo; \u003cem\u003eRemote Sensing\u003c/em\u003e. vol. 9, no. 1, p. 67, 2017. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/rs9010067\u003c/span\u003e\u003cspan address=\"10.3390/rs9010067\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e\u0026ldquo;AI and compute,\u0026rdquo; \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://openai.com/index/ai-and-compute/https://openai.com/index/ai-and-compute/URL\u003c/span\u003e\u003cspan address=\"https://openai.com/index/ai-and-compute/https://openai.com/index/ai-and-compute/URL\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e:\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e\u0026ldquo;AI and efficiency,\u0026rdquo; \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://openai.com/index/ai-and-efficiency/https://openai.com/index/ai-and-efficiency/URL\u003c/span\u003e\u003cspan address=\"https://openai.com/index/ai-and-efficiency/https://openai.com/index/ai-and-efficiency/URL\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e:\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eS. Mittal and S. Vaishay, \u0026ldquo;A survey of techniques for optimizing deep learning on GPUs.\u0026rdquo; Journal of Systems Architecture. vol. 99, p. 101635, 2019. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.sysarc.2019.101635\u003c/span\u003e\u003cspan address=\"10.1016/j.sysarc.2019.101635\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eN.P. Jouppi, C. Young, N. Patil, et al., \u0026ldquo;In-Datacenter Performance Analysis of a Tensor Processing Unit.\u0026rdquo; In: \u003cem\u003eProceedings of the 44th Annual International Symposium on Computer Architecture\u003c/em\u003e. pp. 1\u0026ndash;12. \u003cem\u003eACM\u003c/em\u003e, Toronto ON Canada (2017)DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1145/3079856.3080246\u003c/span\u003e\u003cspan address=\"10.1145/3079856.3080246\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eP. Yao, H. Wu, B. Gao, et al., \u0026ldquo;Fully hardware-implemented memristor convolutional neural network.\u0026rdquo; Nature. vol. 577, no. 7792, pp. 641\u0026ndash;646, 2020. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-020-1942-4\u003c/span\u003e\u003cspan address=\"10.1038/s41586-020-1942-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eA. Shawahna, S.M. Sait, and A. El-Maleh, \u0026ldquo;FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review.\u0026rdquo; IEEE Access. vol. 7, pp. 7823\u0026ndash;7859, 2019. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2018.2890150\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2018.2890150\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eS. Mittal and Vibhu, \u0026ldquo;A survey of accelerator architectures for 3D convolution neural networks.\u0026rdquo; Journal of Systems Architecture. vol. 115, p. 102041, 2021. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.sysarc.2021.102041\u003c/span\u003e\u003cspan address=\"10.1016/j.sysarc.2021.102041\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eD.A.B. Miller, \u0026ldquo;Attojoule Optoelectronics for Low-Energy Information Processing and Communications.\u0026rdquo; Journal of Lightwave Technology. vol. 35, no. 3, pp. 346\u0026ndash;396, 2017. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/JLT.2017.2647779\u003c/span\u003e\u003cspan address=\"10.1109/JLT.2017.2647779\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eA. Boes, L. Chang, C. Langrock, et al., \u0026ldquo;Lithium niobate photonics: Unlocking the electromagnetic spectrum.\u0026rdquo; Science. vol. 379, no. 6627, p. eabj4396, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1126/science.abj4396\u003c/span\u003e\u003cspan address=\"10.1126/science.abj4396\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZ. Wan, H. Wang, Q. Liu, X. Fu, and Y. Shen, \u0026ldquo;Ultra-Degree-of-Freedom Structured Light for Ultracapacity Information Carriers.\u0026rdquo; ACS Photonics. vol. 10, no. 7, pp. 2149\u0026ndash;2164, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1021/acsphotonics.2c01640\u003c/span\u003e\u003cspan address=\"10.1021/acsphotonics.2c01640\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eY. Shen, N.C. Harris, S. Skirlo, et al., \u0026ldquo;Deep learning with coherent nanophotonic circuits.\u0026rdquo; Nature Photonics. vol. 11, no. 7, pp. 441\u0026ndash;446, 2017. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/nphoton.2017.93\u003c/span\u003e\u003cspan address=\"10.1038/nphoton.2017.93\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eH. Zhang, M. Gu, X.D. Jiang, et al., \u0026ldquo;An optical neural chip for implementing complex-valued neural network.\u0026rdquo; Nature Communications. vol. 12, no. 1, p. 457, 2021. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41467-020-20719-7\u003c/span\u003e\u003cspan address=\"10.1038/s41467-020-20719-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eS. Pai, Z. Sun, T.W. Hughes, et al., \u0026ldquo;Experimentally realized in situ backpropagation for deep learning in photonic neural networks.\u0026rdquo; Science. vol. 380, no. 6643, pp. 398\u0026ndash;404, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1126/science.ade8450\u003c/span\u003e\u003cspan address=\"10.1126/science.ade8450\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eY. Zhan, H. Zhang, H. Lin, et al., \u0026ldquo;Physics-Aware Analytic‐Gradient Training of Photonic Neural Networks.\u0026rdquo; Laser \u0026amp; Photonics Reviews. vol. 18, no. 4, p. 2300445, 2024. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/lpor.202300445\u003c/span\u003e\u003cspan address=\"10.1002/lpor.202300445\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJ. Feldmann, N. Youngblood, M. Karpov, et al., \u0026ldquo;Parallel convolutional processing using an integrated photonic tensor core.\u0026rdquo; Nature. vol. 589, no. 7840, pp. 52\u0026ndash;58, 2021. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-020-03070-1\u003c/span\u003e\u003cspan address=\"10.1038/s41586-020-03070-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eC. Huang, S. Fujisawa, T.F. De Lima, et al., \u0026ldquo;A silicon photonic\u0026ndash;electronic neural network for fibre nonlinearity compensation.\u0026rdquo; Nature Electronics. vol. 4, no. 11, pp. 837\u0026ndash;844, 2021. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41928-021-00661-2\u003c/span\u003e\u003cspan address=\"10.1038/s41928-021-00661-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eA. Sludds, S. Bandyopadhyay, Z. Chen, et al., \u0026ldquo;Delocalized photonic deep learning on the internet\u0026rsquo;s edge.\u0026rdquo; p. 2022..\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eW. Zhang, A. Tait, C. Huang, et al., \u0026ldquo;Broadband physical layer cognitive radio with an integrated photonic processor for blind source separation.\u0026rdquo; Nature Communications. vol. 14, no. 1, p. 1107, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41467-023-36814-4\u003c/span\u003e\u003cspan address=\"10.1038/s41467-023-36814-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eB. Dong, \u0026ldquo;Higher-dimensional processing using a photonic tensor core with continuous-time data.\u0026rdquo; Nature Photonics. p.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eB. Dong, F. Br\u0026uuml;ckerhoff-Pl\u0026uuml;ckelmann, L. Meyer, et al., \u0026ldquo;Partial coherence enhances parallelized photonic computing.\u0026rdquo; Nature. vol. 632, no. 8023, pp. 55\u0026ndash;62, 2024. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-024-07590-y\u003c/span\u003e\u003cspan address=\"10.1038/s41586-024-07590-y\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eX. Xu, M. Tan, B. Corcoran, et al., \u0026ldquo;11 TOPS photonic convolutional accelerator for optical neural networks.\u0026rdquo; Nature. vol. 589, no. 7840, pp. 44\u0026ndash;51, 2021. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-020-03063-0\u003c/span\u003e\u003cspan address=\"10.1038/s41586-020-03063-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eS. Xu, J. Wang, S. Yi, and W. Zou, \u0026ldquo;High-order tensor flow processing using integrated photonic circuits.\u0026rdquo; Nature Communications. vol. 13, no. 1, p. 7970, 2022. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41467-022-35723-2\u003c/span\u003e\u003cspan address=\"10.1038/s41467-022-35723-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eB. Bai, Q. Yang, H. Shu, et al., \u0026ldquo;Microcomb-based integrated photonic processing unit.\u0026rdquo; Nature Communications. vol. 14, no. 1, p. 66, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41467-022-35506-9\u003c/span\u003e\u003cspan address=\"10.1038/s41467-022-35506-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eX. Li, W. Gao, L. Lu, J. Chen, and L. Zhou, \u0026ldquo;Ultra-low-loss multi-layer 8 \u0026times; 8 microring optical switch.\u0026rdquo; Photonics Research. vol. 11, no. 5, p. 712, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1364/PRJ.479499\u003c/span\u003e\u003cspan address=\"10.1364/PRJ.479499\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eX. Li, L. Lu, Y. Zhou, W. Bao, J. Chen, and L. Zhou, \u0026ldquo;Low-Loss and Power-Efficient Polarization-Diversity 4 \u0026times; 4 Microring Switch on a Multi-Layer Si3N4-on-SOI Platform.\u0026rdquo; Journal of Lightwave Technology. pp. 1\u0026ndash;10, 2024. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/JLT.2024.3449432\u003c/span\u003e\u003cspan address=\"10.1109/JLT.2024.3449432\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eW. Zhang, C. Huang, H.-T. Peng, et al., \u0026ldquo;Silicon microring synapses enable photonic deep learning beyond 9-bit precision.\u0026rdquo; \u003cem\u003eOptica\u003c/em\u003e. vol. 9, no. 5, p. 579, 2022. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1364/OPTICA.446100\u003c/span\u003e\u003cspan address=\"10.1364/OPTICA.446100\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eX. Liu, W. Zhang, J. Cheng, H. Zhou, and J. Dong, \u0026ldquo;Single-Monitor Calibration for Multiple Microring Synapses.\u0026rdquo; \u003cem\u003eACS Photonics\u003c/em\u003e. p. acsphotonics.4c00157, 2024. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1021/acsphotonics.4c00157\u003c/span\u003e\u003cspan address=\"10.1021/acsphotonics.4c00157\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJ. Cheng, Z. He, Y. Guo, et al., \u0026ldquo;Self-calibrating microring synapse with dual-wavelength synchronization.\u0026rdquo; Photonics Research. vol. 11, no. 2, p. 347, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1364/PRJ.478370\u003c/span\u003e\u003cspan address=\"10.1364/PRJ.478370\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eY. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, and M. Bennamoun, \u0026ldquo;Deep Learning for 3D Point Clouds: A Survey.\u0026rdquo; IEEE Transactions on Pattern Analysis and Machine Intelligence. vol. 43, no. 12, pp. 4338\u0026ndash;4364, 2021. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/TPAMI.2020.3005434\u003c/span\u003e\u003cspan address=\"10.1109/TPAMI.2020.3005434\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eR. Abbasi, A.K. Bashir, H.J. Alyamani, F. Amin, J. Doh, and J. Chen, \u0026ldquo;Lidar Point Cloud Compression, Processing and Learning for Autonomous Driving.\u0026rdquo; IEEE Transactions on Intelligent Transportation Systems. vol. 24, no. 1, pp. 962\u0026ndash;979, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/TITS.2022.3167957\u003c/span\u003e\u003cspan address=\"10.1109/TITS.2022.3167957\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eM.D. Deuge, A. Quadros, C. Hung, and B. Douillard, \u0026ldquo;Unsupervised Feature Learning for Classification of Outdoor 3D Scans.\u0026rdquo; p. 2013..\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eC. Han, Z. Zheng, H. Shu, et al., \u0026ldquo;Slow-light silicon modulator with 110-GHz bandwidth.\u0026rdquo; Science Advances. vol. 9, no. 42, p. eadi5339, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1126/sciadv.adi5339\u003c/span\u003e\u003cspan address=\"10.1126/sciadv.adi5339\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eS.M. Koepfli, M. Baumann, Y. Koyaz, et al., \u0026ldquo;Metamaterial graphene photodetector with bandwidth exceeding 500 gigahertz.\u0026rdquo; Science. vol. 380, no. 6650, pp. 1169\u0026ndash;1174, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1126/science.adg8017\u003c/span\u003e\u003cspan address=\"10.1126/science.adg8017\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eX. Yang, L. Lu, Y. Li, et al., \u0026ldquo;Non-Volatile Optical Switch Element Enabled by Low‐Loss Phase Change Material.\u0026rdquo; Advanced Functional Materials. p. 2304601, 2023. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/adfm.202304601\u003c/span\u003e\u003cspan address=\"10.1002/adfm.202304601\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eH. Shu, L. Chang, Y. Tao, et al., \u0026ldquo;Microcomb-driven silicon photonic systems.\u0026rdquo; Nature. vol. 605, no. 7910, pp. 457\u0026ndash;463, 2022. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-022-04579-3\u003c/span\u003e\u003cspan address=\"10.1038/s41586-022-04579-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eA. Sobhanan, A. Anthur, S. O\u0026rsquo;Duill, et al., \u0026ldquo;Semiconductor optical amplifiers: recent advances and applications.\u0026rdquo; Advances in Optics and Photonics. vol. 14, no. 3, p. 571, 2022. DOI:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1364/AOP.451872\u003c/span\u003e\u003cspan address=\"10.1364/AOP.451872\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"light-science-and-applications","isNatureJournal":false,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"lsa","sideBox":"Learn more about [Light: Science \u0026 Applications](http://www.nature.com/lsa/)","snPcode":"41377","submissionUrl":"https://mts-lsa.nature.com/","title":"Light: Science \u0026 Applications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-5399911/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5399911/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eOptical computing leverages high bandwidth, low latency, and power efficiency, which is considered as one of the most effective solutions for accelerating deep learning tasks. However, mainstream photonic hardware accelerators are primarily optimized for two-dimensional (2D) matrix-vector multiplications (MVMs). To implement three-dimensional (3D) convolutional neural networks (CNNs), high-order tensors must be reshaped, duplicated, and cached in the electrical domain according to the size of the accelerators before computation, leading to extra memory usage and time overheads. Additionally, synchronization across multiple channels depends on external electronic clocks, which increases the complexity of the system. In this work, we propose an integrated photonic 3D tensor processing engine (3D-TPE) based on the interweaving of time, wavelength, and space. Data caching, computation, and synchronization are realized in the optical domain, reducing memory and time usage, and simplifying the system. Optical caching and synchronization are achieved with an optical tunable delay line chip supporting versatile clock frequencies up to 200 GHz, and optical computing is accomplished with a dual-coupled micro-ring resonators (MRRs) based crossbar chip with a 3-dB passband width of 50 GHz. We verify the processing capabilities of the 3D-TPE at clock frequencies ranging from 10 GHz to 30 GHz and perform a proof-of-concept experiment for a LiDAR 3D point cloud image recognition task operating at 20 GHz, achieving a recognition accuracy of 97.06%. The proposed 3D-TPE is anticipated to facilitate high-order tensor convolutions, playing an important role in autonomous driving, healthcare, video analytics, virtual reality, etc.\u003c/p\u003e","manuscriptTitle":"Integrated photonic 3D tensor processing engine","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-01-13 12:54:02","doi":"10.21203/rs.3.rs-5399911/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"light-science-and-applications","isNatureJournal":false,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"lsa","sideBox":"Learn more about [Light: Science \u0026 Applications](http://www.nature.com/lsa/)","snPcode":"41377","submissionUrl":"https://mts-lsa.nature.com/","title":"Light: Science \u0026 Applications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"78f87945-7aeb-438f-a6a8-a6b8cbe5014b","owner":[],"postedDate":"January 13th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":41046735,"name":"Physical sciences/Optics and photonics/Applied optics/Integrated optics"},{"id":41046736,"name":"Physical sciences/Optics and photonics/Applied optics/Optoelectronic devices and components"}],"tags":[],"updatedAt":"2025-01-13T12:54:02+00:00","versionOfRecord":[],"versionCreatedAt":"2025-01-13 12:54:02","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5399911","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5399911","identity":"rs-5399911","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00