Revisit the wavefront kernel for the computation of the Dynamic Time Warping similarity measure in Listing 7.28. In the following we want to investigate further parallelization potential.
(i) Instead of using texture memory for the subject database, we could manually cache the subject sequences in shared memory in order to speed-up the random accesses during relaxation. Implement this approach.
(ii) During an update of a cell each thread tl has to read two entries already processed by the previous thread tlâˆ’1 (left and diagonal) and one entry already processed by the same thread
(above) as shown in Fig. 7.18. Implement a wavefront kernel for DTW that performs the intra warp communication using warp intrinsics instead of shared memory. For simplicity assume that the length of the time series is exactly n = 31.