Hi John,
So I spent a large part of this weekend on your site, it's quite interesting as it has several edge cases that were not properly handled, or not well enough.
Regarding the image selection for OpenGraph, and why it would not pick up that particular image, the following was happening:
- these "invisible" images were linked using a fully qualified URL (ie http + full domain instead of the usual /images/[redacted]x).
- should have been no problem (it's faster to read local images but we can also read remote ones) but they were linked using the http version of the site instead of the httpS version (that's the big problem when yo use fully qualified links to pages and images)
- should not have been a problem either as our code follows the redirects BUT when trying to load some of those image files to dtermine their size (needed to know if they can be used as OGP), there was an error of some kind and so the image was discounted as invalid as we did not know their width and height
I fixed that by updating the library used to read remote image dimensions to one that uses a different methodology and this one can read your images dimensions allright. I believe the error is linked to how the files were created and the library was probably not handling correctly the files header.
Anyway that's solved and your site currently runs a version that properly reads the size of your images and therefore can make a decision on which to use.
You'll note that on some pages, the "top" image, 500 x 500 px is not always the one selected. Instead, a larger one from the article can be used if found. We can talk about that on specific examples, I have found generally that the choices are ok and using the 500 x 500 is usually not the best option as it's cut off when shared on Facebook (desktop) or you only get a small thumbnail when sharing on Facebook mobile.
It's also possible to switch to using the first image in article instead of the largest, but it's a hidden setting for now as it's not a good option generally speaking.
As this was fixed I ran the analysis on the site and found that it got stuck at some point, constantly analysing a few pages, then adding a few more and so on.
I added some logging information and found out what's happening: one of those edge cases I mentioned:
- you have a number of articles also linked from other articles using fully qualified URLs (FQDN)
- however and again, the FQDN is not always correct: many times, the links are http instead of httpS and in a few cases, the links are missing the www prefix.
- A second problem appeared after that: all pages were considered non-canonical. After quite a bit of debugging I found that you have added a domain value in the SEF system plugin, which is fine but Joomla actually expects internally this domain withouit a trailing slash (ie https://www.example.com instead of https://www.example.com/).
Normally, that does not cause any issue because when using that value the SEF plugin does clean the canonical links when outputting it into the page BUT when 4SEO was asking Joomla for the canonical value "internally", in code, it was getting an additional slash: https://www.example.com//some-article
And this broke the mechanism to decide whether a page is canonical. I added some clean up code and this is working as expected now.
- Now the problem that got 4SEO stuck is about www: you did put an HTTP to HTTPS redirect, but not a non-www to www one and so your entire site can be accessed through both. That's bad and you need to add a redirect (that's just a switch in sh404SEF under Advanced tab of configuration).
- What threw 4SEO off was the fact that http://[redacted].com/some-article is considered an external page (because the site home address is https://www.example.com) but when requesting the page http://[redacted].com/some-article, we did get a response from the site itself.
I had to make some changes to 4SEO to accomodate that situation and it should be working OK now.
As I made the changes late yesterday, I could not try install the latest version on your site to complete the testing but I plan on doing that today. Obviously, for validating that code, it would help if you did NOT switch on non-www to www redirect right away.
Best regards
Yannick Gaultier
weeblr.com / @weeblr