• Home
  • Get help
  • Ask a question
Last post 46 min ago
Posts last week 81
Average response time last week 4 hours 29 min
All time posts 67932
All time tickets 10501
All time avg. posts per day 20

Helpdesk is open from Monday through Friday CET

Please create an (free) account to post any question in the support area.
Please check the development versions area. Look at the changelog, maybe your specific problem has been resolved already!
All tickets are private and they cannot be viewed by anyone. We have made public only a few tickets that we found helpful, after removing private information from them.

#1117 – Spurious URLS for /figures"

Posted in ‘sh404SEF’
This is a public ticket. Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.
Tuesday, 06 October 2015 04:16 UTC
bwmoore22
 I have a number interactive graphics that I have included in articles using markup like the following:

<iframe src="/images/interactive/two_product_price_optimization.html" width="1024" height="768">Your browser does not support the <code><iframe></code> tag; you can view the visualization <a href="https://www.xxxx.com/images/interactive/two_product_price_optimization.html">here</a>.</iframe>

When viewing the article, everything is fine.

For each and every one of these, Google Search Console reports a "Not found" crawl error for

tag/images/interactive......

the src="/image/interactive..." is the form of the URL in the article.

What is introducing this URL when Google crawls my site and how do I fix my site, my .htaccess or my robots.txt to prevent these crawl errors...it is a pain to go in and add redirects for every single one.


For what it is worth, the best example article for the problem is

https://www.xxxx.com/Open-Source-Software/examples-of-interactive-graphics
Tuesday, 06 October 2015 10:55 UTC
wb_weeblr
Hi

1 - It's not because you have crawling errors in google SC that you have a to do a redirect. You do redirects only if there is a reason to redirect to a new page, with similar content as the old page for instance.
Having a 404 is perfectly valid if a page is not available.

2 - To be honest, I don't really understand your question, so let me reformulate what I understood: Google SC shows some crawling errors for URLs starting with
tag/images/interactive...
(with no leading slash)

and you are surprised about it because you don't see those URLs inside your site page?

Rgds
 
Wednesday, 07 October 2015 19:19 UTC
bwmoore22
That is correct. The URL does not (and should not) occur in my site page. I don't know why Google thinks that this URL should exist.

Thursday, 08 October 2015 07:12 UTC
wb_weeblr
Hi

OK, we're on the same base now.

1 - Do you see any of those /tag/images/... URLS in the URL Manager (not in the 404 manager, I do mean in the URL Manager, where URL transformed by sh404SEF are stored).

2 - What's the history of the site? I mean, when did it open, when did you install sh404SEF?

Rgds
 
Thursday, 08 October 2015 20:09 UTC
bwmoore22
1 - Do you see any of those /tag/images/... URLS in the URL Manager (not in the 404 manager, I do mean in the URL Manager, where URL transformed by sh404SEF are stored).
=================================================================================

There are no entries for "tag/images/interactive..." in the URL Manager. The only "tag" entries are of the form "tag/15" and "tag/15-amortization"; there are two for each of the tags that I have defined.

2 - What's the history of the site? I mean, when did it open, when did you install sh404SEF?
=================================================================================

June 2013 - Created (by me...first time web master, but long-time programmer). Started on Joomla 3x, upgraded as new maintenance available. Plugins are Akeeba Backup Pro, Admin Tools Pro, for a period JBetolo (disabled since fall of 2014 or earlier), Google Authorship (now disabled).

February 2015 - Installed sh404SEF to fix duplicate URL problem in Google Search Console. Didn't take site offline and took several days to completely figure out, so number of duplicates actually got worse for a period of time.

April 2015 - Switched to JCE. First articles referencing Google Trends using HTML of form <iframe src="http://google.com/trends....">

September 2015 - Installed JEvents, CMC MailChimp, and JCH-Optimize. Put first article using interactive graphics with HTML of form <iframe src="/images/interactive/graph.html"></iframe> This construct caused Google crawl errors of the form "tag/images/interactive"
Friday, 09 October 2015 07:40 UTC
wb_weeblr
Hi

OK, what I think is as follow:

- as those URLs are not in the UTL manager, they simply are not created by sh404SEF. In other words, I don't think they actually exists on your site.
- the most likely explanation is that Google finds them in trying to explore your site. It's very common for instance that they feed a search module with words to try find new pages on a site. We also see a lot of "fake" URLs when their bot execute javascript and try to build URLs to execute ajax requests.
- The iframe is something that always caused problems to Google, or rather that they didn't bother too much about.

To be sure those URLS don't exists on your site (even though sh404SEF don't create them, there might be created by other extensions or Joomla), what you might want to try is use an online sitemap maker (not a Joomla extension, they don't show all URLs) to crawl your site and report all of your links. There's a good one at auditmypc.com (though it's java so browsers now avoid running it), but there are some more.

Rgds
 
Saturday, 24 October 2015 05:34 UTC
system
This ticket has been automatically closed. All tickets which have been inactive for a long time are automatically closed. If you believe that this ticket was closed in error, please contact us.
This ticket is closed, therefore read-only. You can no longer reply to it. If you need to provide more information, please open a new ticket and mention this ticket's number.