Welcome to my first blog post on the final project for SPO 600. What this project requires us to do is to find an open source software package, benchmark a use case with that software, and finally optimize the performance of that use case.
Link to openh264: https://github.com/cisco/openh264
The software package I chose to benchmark and optimize is an open source implementation of the h264 video codec, also known as openh264. The use case I chose to test was to decode a 1080p video 1000 times. I chose this use case as it would last for several minutes and would also test how many videos this codec can decode through within a short amount of time. I created this use case by modifying the tester shell script included with the package. The name of this shell script is CmdLineExamples.sh and is in the "testbin" directory.
Test script code:
for ((i = 1; i < 1000; i = i + 1))
do
../h264dec ../res/jm_1080p_allslice.264 Static-2.yuv
done
Testing of the use case was conducted on two arm_64 machines as well as two x86_64 machines. The arm_64 machines used were aarchie and bbetty. And the x86_64 machines I used were xerxes and a Fedora VM running on my PC which uses an AMD Fx-6300 CPU. The testing method used was to run the use case twice on each machine per optimization level (O1 - O3). The stats gathered included the time it took to run the use case, the ranges of frames decoded per second (fps) and the decode time per video. A word of warning, I recorded the time by running a stopwatch while the use case is running which in turn will cause the numbers to be off a by a second or two. In regards to the frames decoded per second and the decode times, those were gathered by looking at the last 10+ iterations of the loop and observing any patterns that rose up from the those numbers.
Results Link: https://gist.github.com/Megawats777/3fa527e980975f8a5254dc9b6c7fe5a4
In terms of the results there were several aspects that stood out to me. First, as the optimization levels got closer to O1 the aarchie machine saw more and more noticeable decreases in performance, while the xerxes machine barely saw differences in performance as the optimization levels got lower. Second, was that the xerxes system blew all the other machines out of the water during the tests. For instance, the frames decoded per second stat had xerxes have an average frame rate of 23 - 24, while other machines except for aarchie were hovering around the 7 - 8 fps mark. Finally, the performance hit of a VM during these tests was a lot more impactful that expected. This performance hit caused the VM results to be closer to the numbers from the bbetty machine than expected.
Overall in regards to my experiences so far it was both confusing but quite fascinating. The confusion came from getting everything setup as everything from the makefile and VM had some kind of configuration trouble. But after all that setup was done it was quite cool using a shell script to automate a test and receive results. As well as seeing how each of the remote machines can handle themselves when under heavy load.
Thank you for reading and I'll see you in stage 2.
No comments:
Post a Comment