• Home
  • Get help
  • Ask a question
Last post 7 hours 58 min ago
Posts last week 94
Average response time last week 34 min
All time posts 67914
All time tickets 10497
All time avg. posts per day 20

Helpdesk is open from Monday through Friday CET

Please create an (free) account to post any question in the support area.
Please check the development versions area. Look at the changelog, maybe your specific problem has been resolved already!
All tickets are private and they cannot be viewed by anyone. We have made public only a few tickets that we found helpful, after removing private information from them.

#2636 – A lot of 404

Posted in ‘sh404SEF’
This is a public ticket. Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.
Monday, 22 August 2016 14:55 UTC
MartinT
 Hi there.

I've got A LOT of 404 (2000+) from the past 2 month. I looked at the documentation, and found out, that I dont have any internal 404.
I really dont know, how I get so many 404 from outside. And I dont know what to do about them. Some of them has no relation at all to my site (I can see WP-LOGIN eg, prob a Hacker-Bot), but some of them have relevance, I just dont know who created them, how they got there, or what I should do about it. Just leave them, if its not an important link? (Im affraid I'll get punished by google, with all these 404)

I've attached a Screenshot from some of the 404, so you can see, what kind of urls, and maybe give some adwise.

Sincerely,
Martin
Monday, 22 August 2016 14:57 UTC
wb_weeblr
Hi

(Im affraid I'll get punished by google, with all these 404)
You can't get punished if those 404 are not on your site. However what may have happened is that your site was previously indexed by Google, and your URLs changed when you installed sh404SEF?

Rgds
 
Monday, 22 August 2016 15:13 UTC
MartinT
Happy to hear that I cant be punished, when from another site :D

Some of them might be from my old site, yes. But I cant imagine that all of them are. I cant recognize many of the URLS.

This one is fairly new, and HAVENT been on the old site (Its about the ZIKA virus, so I know its a new article):

item/632-laeger-giftigt-drikkevand-er-skyld-i-deformitet-hos-zika-spaedborn

Its like they just get created out of nowhere!


Maybe some of the links is from my previuos site (As you know, many Joomla plugins sometimes makes different URLs than Joomla's own), that have been indexed by Google earlier.
I did make a lot of changes (Nearly everyone of the old URL's has been changed)

Should I leave them be, or just redirect all of them to my frontpage?
Im really lost here, really dont know what to do.


Sincerely,
Martin

Monday, 22 August 2016 17:25 UTC
wb_weeblr
Hi

Should I leave them be, or just redirect all of them to my frontpage?
You should basically never do that. Redirecting has any use only if you redirect an old URL to a new URL that has very similar content. Otherwise, if the page has been removed, significantly modified (to the point where you can say it's not really the same content), having a 404 is exactly what should happen and is very good. It's a common misconception that 404s are bad (they are if you have them on your site, as outline above).

item/632-laeger-giftigt-drikkevand-er-skyld-i-deformitet-hos-zika-spaedborn
Now this URL looks like a K2 URL, right? If so and if you have updated to K2 V 2.7, have you obtained the sh404SEF plugin from K2?
Prior to version 2.7? it was included in the K2 main download but it's now a separate plugin.

Also, in order to get the most information, have you enabled "Record 404" on the "Data recording" tab of sh404SEF configuration? (see this page of documentation)

Rgds
 
Tuesday, 23 August 2016 07:38 UTC
MartinT
I get your point about 404, and that it is ok to have some (and NOT directing them all to the same page, crazy me :D ).

It probably is a K2 URL, the site is build upon it.

I havent updated K2 to version 2.71 yet (Think I have 2.7) but will do right away. I did purchase that ekstra sh404sef-plugin for K2 (I think back in may, just after an update)

I will enable "404 recording" right away.
And "Record source", I guess it will tell me, from where the link came?

Should I delete/purge all the 404 from the "404 Request manager", to get the data recorded?


Sincerely,
Martin


Tuesday, 23 August 2016 09:34 UTC
wb_weeblr
Hi

And "Record source", I guess it will tell me, from where the link came?
That will not help for 404. This is only used for tracking the source of SEF URLs created on the site. By definition, 404s are not properly created, so Record source cannot provide any additional information on 404.
You should not enable this however, as it will record quite some data on all the other URLs, the good ones. It's interesting to debug an issue, for a short period of time, but be sure to disable it afterwards.
Again, in this case, it won't help.

You do not need to purge the 404 logs to have data recorded, however, data will be recorded only for 404s happening from now on, quite obviously.

Rgds
 
Tuesday, 23 August 2016 13:59 UTC
MartinT
Ok, thx.

I will try enable "record 404" to see what is going on. But I´ll also purge the 404 log, just to be sure there is some data to analyse from.

I will return to this ticket, when I got some data, and hopefully I can come closer to a solution.

Sincerely,
Martin

Tuesday, 23 August 2016 14:07 UTC
wb_weeblr
Hi

I will leave this ticket open in case you need to add something. It will automatically close in 2 weeks, if not further comment is made.

Best regards
 
Friday, 26 August 2016 10:35 UTC
MartinT
Hi again

I collected some date to see where the links came from.



11-fodsel/449
User agent : Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)

I followed the link, and came to a page, that explained its a bot from a search engine project, someone is developing.

On this page, I found the following information:

"Why do you keep crawling 404 or 301 pages?

We have a long memory and what to ensure that temporary errors, website down pages or other temporary changes to sites do not cause ireperable changes to your site profile when they shouldn't. Also if there are still links to these pages they will continue to be found and followed. Google have published a statement since they are also asked this question, their reason is of course the same as ours and their answer can be found here: Google 404 policy"


So the search engines continue searching, but for how long, if they dont find the link again?

I was thinking block it, as its not a known search engine (As I read it, it is a closed project)

Would you recommend block it or point to a relevant page on my site?
A lot of the 404 is from the abovementioned bot. I'd really would like to get rid of them somehow (Block bot or point to new page), as its so confusing with all theese 404.



Another one confuses me a lot, since it is an URL to backend:

administrator/components/com_acymailing/inc/openflash/php-ofc-library/ofc_upload_image.php
User agent : Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6

Is it Firefox through Windows, or what could it be?



There is, of course, also a lot links from Google and Bing, but I guess they are links, from when I moved the site, so its an easy fix.

But the other ones are really anoying, do you have any suggestions as to block or redirect?

Sincerely,
Martin


Friday, 26 August 2016 11:21 UTC
wb_weeblr
Hi

1 - Majestic is a well known, fairly large SEO company. They sell tools for SEO firms and consultants. It's not a problem to block them, but again, it's not a problem to have 404s anyway. Not worth spending time on this.

2 -
Is it Firefox through Windows, or what could it be?
That doesn't matter, the User agent is set by the people making the request, so they can say anything they want, it's not real.
This looks like an attempt at hacking, as this file does not exist in AcyMailing (at least in version 5.5, the current one). Do you actually have Acymailing running on your site?
It's probably an attempt to use an old vulnerability in Acy maybe. That's typically something you'd want to block.

Rgds
 
Monday, 29 August 2016 07:18 UTC
MartinT
Hi

1) I see, in fact I use their Rank Tracker my self :) The reason I really want to delete these, is, that I cant get an overview of all the importent 404. The will get lost in mass of 404 from Majestic.

2) No, I dont use AcyMailing, its not even installed on my site :) I'll block those URLS.

Thx for the help. I think im gonna block the Majestic 404's just to get an overview (If it have no influence anyway), and the user agent's 404.

Sincerely,

Martin
Tuesday, 30 August 2016 07:28 UTC
wb_weeblr
Hi

OK, glad to hear this is under control. Going to close this ticket now. Feel free to open a new one as needed and if so, please mention this ticket ID in the new one.

Rgds
 
This ticket is closed, therefore read-only. You can no longer reply to it. If you need to provide more information, please open a new ticket and mention this ticket's number.