# An FPGA-based Accelerator for Sound Field Rendering with High-order FDTD

Yiyu Tan RIKEN Center for Computational Science Japan tan.yiyu@riken.jp Toshiyuki Imamura RIKEN Center for Computational Science Japan imamura.toshiyuki@riken.jp Masaaki Kondo RIKEN Center for Computational Science Japan masaaki.kondo@riken.jp

## 1. INTRODUCTION

FDTD method has already become an essential method in room acoustics because of its high accuracy, easy implementation and parallelism. Although many works were done to reduce the inherent dispersion and oversampling in spatial grids, they still require high computation capability and memory bandwidth. On the other hand, GPUs and FPGAs have been applied to speed up computation in sound field rendering in recent years because of their much higher parallel computation capability over traditional general-purpose processors [1-4]. In particular, latest FPGAs provide thousands of hardened floating-point arithmetic units, several Megabytes of on-chip block memories, and millions of reconfigurable logic blocks to directly implement sound wave equation and customize data path to accelerate computation. In this research, an FPGA-based accelerator is developed to speed computation in sound field rendering with the high-order FDTD.

## 2. SYSTEM DESIGN

**Rendering algorithm.** A high-order approximation based on the Lagrange polynomial fitting [1] is applied to accurately approximate the second-order partial derivative in spatial domain, and the second-order center difference method is employed on time domain in wave equation. The update equation of sound pressure is derived, such as the 4th-order FDTD scheme shown as follows [4].

$$\begin{split} P_{i,j,k}^{n+1} &= \chi^2 [-\frac{1}{12} (P_{i-2,j,k}^n + P_{i+2,j,k}^n + P_{i,j-2,k}^n + P_{i,j+2,k}^n + P_{i,j,k-2}^n + P_{i,j,k-2}^n + P_{i,j,k-2}^n) \\ &+ \frac{4}{3} (P_{i-1,j,k}^n + P_{i+1,j,k}^n + P_{i,j-1,k}^n + P_{i,j+1,k}^n + P_{i,j,k-1}^n + P_{i,j,k+1}^n)] + (2 - \frac{15}{2} \chi^2) P_{i,j,k}^n - P_{i,j,k}^{n-1} \\ &\text{where } P \text{ is the sound pressure of a node } (i,j,k) \text{ and } \chi \text{ is the Courant number.} \end{split}$$

System design and im

**System design and implementation**. The system is designed using OpenCL. The spatial blocking is applied to alleviate external memory bandwidth bottleneck and the size of on-chip buffers. The temporal blocking is used to reuse data and compute sound pressures of a same spatial block at continuous time steps. As shown in Figure 1, the system consists of Data input module, computation engine, and Data output module. The computation engine contains 16 processing elements (PEs). Each PE computes sound pressures of grids in a spatial block at a time-step, and all PEs are cascaded to compute sound pressures of grids in the same spatial block at continuous 16 time-steps.

### **3. PERFORMANCE EVALUATION**

Table 1 presents the rendering time taken by the proposed FPGAbased accelerator implemented using the FPGA card DE10-Pro and software simulations performed on a desktop machine with 512 GB DRAMs and an Intel Xeon Gold 6212U 24-core processor running at 2.4 GHz. The sound space was a threedimensional shoebox with dimension being 16m×8m×8m. The incidence was an impulse and the number of the computed time steps was 32. The reference C++ codes in software simulations were compiled using the GNU compiler (version: 4.8.5) with the option -O3 and -fopenmp to use all 24 processor cores. As shown in Table 1, the proposed accelerator outperforms the software simulations by 11 times, 13 times, and 18 times in computing performance in the case of the 2nd-order, 4th-order, and 6th-order FDTD schemes, respectively, even though it runs at much lower clock frequency (about 350 MHz).



Figure 1. System Diagram

Table 1. Rendering Time Per Time Step (s)

| Orders | FPGA   | Software |
|--------|--------|----------|
| 2nd    | 0.0486 | 0.5363   |
| 4th    | 0.0333 | 0.4458   |
| 6th    | 0.0238 | 0.4437   |

## ACKNOWLEDGMENTS

Thanks for Intel's donation of the FPGA card DE10-Pro and the related EDA tools through University Program. This work was supported by JSPS KAKENHI Grant Number JP19K12092.

#### REFERENCES

- [1] J. Mourik, and D. Murphy. Explicit higher-order FDTD schemes for 3D room acoustic simulation. *IEEE/ACM Transactions on Audio, Speech, and Language. Processing*, 22[12] (2014), 2003-2011.
- [2] Y. Tan, T. Imamura, and M. Kondo. FPGA-based acceleration of FDTD sound field rendering. *Journal of the Audio Engineering Society*, 69[7/8] (2021), 542-556.
- [3] T. Yiyu, Y. Inoguchi, M. Otani, et al. A real-time sound field rendering processor. *Applied Science*, 8[35] (2018).
- [4] Y. Tan, T. Imamura, M. Kondo. Design and implementation of high-order FDTD method for room acoustics. In proceedings of *the 41<sup>th</sup> Symposium on Ultrasonic Electronics* (USE 2020), Osaka, November 2020.