• Home
  • Get help
  • Ask a question
Last post 6 hours 24 min ago
Posts last week 89
Average response time last week 30 min
All time posts 67708
All time tickets 10463
All time avg. posts per day 21

Helpdesk is open from Monday through Friday CET

Please create an (free) account to post any question in the support area.
Please check the development versions area. Look at the changelog, maybe your specific problem has been resolved already!
All tickets are private and they cannot be viewed by anyone. We have made public only a few tickets that we found helpful, after removing private information from them.

#7113 – Large list of 404s

Posted in ‘sh404SEF’
This is a public ticket. Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.
Wednesday, 22 April 2020 19:41 UTC
dpitfield

 I am looking in my back-end after installing a few weeks back. There is a large number of 404's. A number are legitimate from the change over from the Mijo SEF switch. I'm at 9600 404 URLs already. Is this a concern?

I have fully removed Mijo SEF now FYI.

xxxx.com

It looks like some dynamic URLs are getting listed in the 404 list. Is there a way to prevent this, or is it normal to let these accumulate and grow. I'm not sure how to handle these. Please advise.

URLs, such as:
possession-ssgs/page-4
attacking-functional-drills/page-4

Please let me know how to handle these.

Thanks!
Darren

Thursday, 23 April 2020 08:45 UTC
wb_weeblr
Hi

I'm at 9600 404 URLs already. Is this a concern?
The actual number has no meaning and is of no concern. 404s are not an SEO problem in themselves.

It looks like some dynamic URLs are getting listed in the 404 list. Is there a way to prevent this, or is it normal to let these accumulate and grow. I'm not sure how to handle these. Please advise.
404s are just 404s, they have no meaning and are not the sign of a problem by themselves. It just mean that someone requested a page that does not exist on your site and this was recorded. You probably have plenty of 404s that are for WordPress files or things like that.

LIkewise for what you call "dynamic URLs". URLs are URLs, whether they have query parameters or are just a string of characters does not matter (although of course for various reasons you want to keep them short and readable as much as possible).

The only issue 404s can cause and which is why they need monitoring is when they denote a bad link on your site or failure of an extension.

If I take the example of attacking-functional-drills/page-4, I visited: attacking-functional-drills. There I can see a pagination list where the links for page 2, page 3 , etc are:

/functional-training/attacking-functional-drills/page-2
/functional-training/attacking-functional-drills/page-3
/functional-training/attacking-functional-drills/page-4

In addition, when you go to page 2, and look up the link back to page one, it's:

/functional-training/attacking-functional-drills/

So I would think that when you were using Mijo, that page URL was

/attacking-functional-drills/
and subsequent pages were:

/attacking-functional-drills/page-2
/attacking-functional-drills/page-3
...

Either you imported the 1st page, attacking-functional-drills/ into sh404SEF, or you manually customized it, which is why the link /attacking-functional-drills/ is still working.
But page 2, 3 and 4 were not imported or customized, and so those pages are created automatically by sh404SEF following the normal structure for categories which is:

/functional-training/attacking-functional-drills/
/functional-training/attacking-functional-drills/page-2
/functional-training/attacking-functional-drills/page-3
/functional-training/attacking-functional-drills/page-4

Hence /attacking-functional-drills/page-4 does not exist. Would be interesting to know if /attacking-functional-drills/page-2 and /attacking-functional-drills/page-3 exists in the SEF URL manager and work of if they are also in the 404 list.

One way or the other, this does not denote an operational issue with the site. All works normally. What can be an issue is the change in URLs. As thoes URLs were not imported and no redirect was created, Google will need to update their index. Those are the type of 404s that you should redirect (use the Redirect to SEF button on the 404 manager) to their new version.

There's no short version to what to do with 404s. Most are no concern and do not require any action.

As you transitioned SEF extension and likely not 100% URLs were transferred as we just saw, this is what you should be monitoring and redirect those that are legits URLs from the "old" site to their equivalent on the "new" site. You can use the time period filter on the 404 URL manager to only show 404s from the last hour, last day and so on. Looking a the details of a 404 will show the User agent. If it's Googlebot, you may want to check whether the 404 is legit (ie an old, bad link) or if Google is trying to read a URL that was changed during the transfer. If so, redirect that to the new URL.

Best regards

Yannick Gaultier
weeblr.com
@weeblr

 
Thursday, 23 April 2020 16:53 UTC
dpitfield
Thanks for the above information.

I see a situation I have where my top 404s are either a .txt file or an image file (i.e. jpg).
Is there a way for me to locate these images/file issues to fix the issue? I have no idea where there are coming from. Please see the attached screen shot showing the .txt URL and image ones.

Thanks in advance.
Darren
Thursday, 23 April 2020 16:53 UTC
dpitfield
Did not take image, trying again.
Thursday, 23 April 2020 16:54 UTC
dpitfield
Thursday, 23 April 2020 16:56 UTC
wb_weeblr
Hi

I see a situation I have where my top 404s are either a .txt file
A request for a text file does not seem real, sounds like a bot request looking for information.

an image file (i.e. jpg).
But do you know for a fact that those images exist?

Images are not going through sh404SEF (or Joomla SEF process for that matter). They are just actual and real file lying on your server. If you don't know that those files actually exist on your server, then they are just random bots requests.

What are actual full requests that you see listed there?

Best regards

Yannick Gaultier
weeblr.com
@weeblr
 
Thursday, 23 April 2020 16:57 UTC
wb_weeblr
Hi

Just copy/paste one or 2 such links from the 404 list.

Best regards

Yannick Gaultier
weeblr.com
@weeblr
 
Thursday, 23 April 2020 17:35 UTC
dpitfield

I tried these and no results:
https://www.xxxx.com/ads.txt
https://www.xxxx.com/apple-touch-icon.png

I looked on the server and do not see ads.txt file anywhere.
I am not sure that those images exist. Its quite a large site, I looked in the usual spots but cannot see them.

Is there a way to block these requests? Would you recommend that?

Thanks,
Darren

Thursday, 23 April 2020 17:42 UTC
wb_weeblr

Hi

Those are direct links to the files, whether they are present or not is not influenced by Joomla or sh404SEF.

I looked on the server and do not see ads.txt file anywhere.

Which means this is most likely a bot random request.

https://www.xxxx.com/apple-touch-icon.png

This is a standard location for an apple icon, one that Safari will try load when your pages are viewed from an iphone (BTW you do not provide favicon for your site, you should, it's a nice touch)

Is there a way to block these requests? Would you recommend that?

You can block at server level (in your .htaccess for instance). If you do that, you'll save a tiny bit of server resource as this avoids loading Joomla just to display a 404 error page for a page we now no actual human being will see them. It's a good idea in general until you start blocking many, many files in your .htaccess and waste resource there as well and maybe block some real requests by mistake.

If your server is not giving you any resource problem, I would not bother with all that.

Best regards

Yannick Gaultier
weeblr.com
@weeblr

 
Friday, 08 May 2020 05:34 UTC
system
This ticket has been automatically closed. All tickets which have been inactive for a long time are automatically closed. If you believe that this ticket was closed in error, please contact us.
This ticket is closed, therefore read-only. You can no longer reply to it. If you need to provide more information, please open a new ticket and mention this ticket's number.