ImgFilter
About
This personal project I created to get myself aquainted with GPGPU programming. Since I have an AMD graphics card I decided to go with AMD's ROCm platform and use their HIP programming model to create my app. The most popular GPU compute framework that most people have heard of is Nvidia's CUDA platform. Intel is new on the scene of graphics card but they are also getting up to speed in the heterogeneous computing with their oneAPI which makes use of the SYCL standard.
How it Works
The grayscale and sepia filters are simple because they work on one pixel at a time. The grayscale filter simply averages the RGB channels while the sepia filter transforms the colors for a channel based on the equations found on Yabir's Blog
For the box blur and Gaussian blur, a kernel has been applied over each pixel in the image. This kernel is a mask or matrix of a particular size, but we usually work with square kernels of size NxN. The kernel will take the weighted average of the pixels surrounding the center pixel to determine the center pixel's new value. See an an example of a 3x3 kernel for the box blur below.

The reason a GPU is able to accelerate these computations so well is because each output pixel only depends on the value from the input pixels. None of the output pixels have to wait for any other output pixel to be computed first. This allows us to launch several GPU kernel threads at once to perform this computation.
Future Improvements
One of the main reasons I wanted to create an image filtering app was to learn more about how edge detection works in the field of Computer Vision. The Canny Edge Detector is a popular edge detection algorithm and I already have two of the steps in the algorithm implemented. Namely, the grayscale and Gaussian blur filters.
The second improvement that I need to make is to implement a kernel filter that works with seperable kernels. Just to get things going, I first implemented a non-seperable kernel filter, which works on a NxN filter. But some filters can be seperated into a horizontal and vertical filters, resulting in two filters of dimension Nx1 and 1xN. This speeds up the computation drastically.

The final improvement will be to implement the same algorithms on the CPU (or get the HIP kernels to run on the CPU) so that I can compare the speed up I get from using the GPU to compute the image filters.