Building on the impressive work of u/kcimc below, I was inspired to apply a different method of analysis in Photoshop:
I’ve taken a section of the video and stacked approx. 40 frames together to analyze the background. The jist of this is multiple frames from a video are aligned on top of each other, and Photoshop does some math to the pixel values. The three images included are a single normal frame, a frame where each pixel is averaged to it’s column of aligned pixels producing an average of all the frames, and a range which is similar in effect to the difference filter (this is the black and white image). The range takes the brightest pixel in each column and subtracts the darkest pixel, so in this case a white orb over a dark ocean for a single frame will return a bright pixel, and a pixel that changes very little over the course of the video will appear very dark. Additionally, the image analyzed with the range mode has been brighten to enhance the details.
Explanation here of stack modes: https://helpx.adobe.com/ca/photoshop/using/image-stacks.html
The Average Frame removes the image noise and allows you to better see the wave caps.
What’s the point of all this then? I want to see if the wave caps on the ocean are moving. You can see them as the tiny flecks of white on the water. They should move throughout the entire video, being blown by the wind, and appearing and disappearing as they rise and crest.
However, as this frame stack shows, the entire background of the video is still. There is some visual noise that’s been introduced, as you can see the difference between the grainy normal image and the smooth mean (average) image, but that noise and the motion of the plane, orbs, and cursor are the only differences between each frame.
I’d also like to comment about this page on the Internet Archive which I think is causing some confusion:
Published on May 19, 2014
Received: 12 March 2014
Posted: 19 May 2014
This is the video description written by the uploader. It wasn’t added by youtube, and is therefore not credible. That ought to be obvious, but here we are.
It is my opinion as a professional photo/video editor for 14 years, that this video is an animation composited onto a still image taken from commercially available satellite imagery, like from Google Earth, or possibly the source imagery like Maxar. The coordinates have been composited in as well. I don’t have much experience creating text like this synced to camera movements, but using my imagination I think it’s within the realm of possibility for a skilled VFX artist to sync it to the image being panned or to write a script that converts the coordinates of the viewing window to a fake GPS coordinate.