• Home
  • Get help
  • Ask a question
Last post 1 hour 39 min ago
Posts last week 81
Average response time last week 44 min
All time posts 70355
All time tickets 10859
All time avg. posts per day 20

Helpdesk is open from Monday through Friday CET

Please create an (free) account to post any question in the support area.
Please check the development versions area. Look at the changelog, maybe your specific problem has been resolved already!
All tickets are private and they cannot be viewed by anyone. We have made public only a few tickets that we found helpful, after removing private information from them.

#8149 – Another Site analysis does not finish

Posted in ‘4SEO’
This is a public ticket. Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.
Tuesday, 21 September 2021 08:13 UTC
smirrederfuchs-gmail-com

Hi.

On a second website the site analysis does also not finish, as i had the same issue in this ticket:
https://weeblr.com/helpdesk/4seo/8131-site-analysis-does-not-finish

I already have excluded in the page settings the "/events/date/{*}", but that does not help. the Pending pages does not get finished since around weeks now:
https://www.screencast.com/t/aqCZOPAzGPDH

please have a look on my page what is causing the problem:

Backend LINK: 
https://[redacted].plus/administrator/index.php?foxter

htaccess: 
[redacted] / [redacted]

Joomla Super User: 
[redacted]/ [redacted]

Tuesday, 21 September 2021 08:18 UTC
wb_weeblr

Hi

1 - I cannot access the site as it's protected by Cloudflare

I already have excluded in the page settings the "/events/date/{*}", 

Are you using the same calendar component? this exclusion only works if you have the same component and the URLs to the events page are the same, ie /events/date/[redacted]xx.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Tuesday, 21 September 2021 08:23 UTC
smirrederfuchs-gmail-com

Sorry, i forgott the Cloudflare security rule, i have deactivated it now - please try again.

Yes, i also have here "Easysocial" running, what have the events/calendar integrated.

Tuesday, 21 September 2021 08:29 UTC
wb_weeblr

Hi,

Yes, i also have here "Easysocial" running, what have the events/calendar integrated.

Where can I see such calendar, please provide a link, I do not read german!

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Tuesday, 21 September 2021 08:30 UTC
smirrederfuchs-gmail-com

No problem, here is the link to the events section:
https://[redacted].plus/events

Tuesday, 21 September 2021 08:38 UTC
wb_weeblr

Hi

I have looked at the error log files and I can see indeed multiple errors there. 

1 - Did you add "/events/date/{*}" before starting the analysis?

2 - If you added it later, did you Reset the analysis after adding that line?

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Tuesday, 21 September 2021 08:40 UTC
wb_weeblr

Hi again,

Also, I am running the analysis from the admin and it's running fine, with the number of pending URLs decreasing:

Is this how you are doing it? or you just let it happen from the front end?

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Tuesday, 21 September 2021 08:43 UTC
smirrederfuchs-gmail-com

The analysis was started before i added this "/events/date/{*}" under the page settings. It was not clear to me that I should then reset everything.

I have setup a cronjob for very 2 minutes. I assumed that this is how it works.

Tuesday, 21 September 2021 08:49 UTC
smirrederfuchs-gmail-com

I checked the cronjob URL and i have the feeling that this is not working:
https://[redacted].plus/?_wblapi=/forseo/v1/cron/http&k=faebf33de6

i do not get any result when i manually fire the job, is this normal?

Tuesday, 21 September 2021 08:56 UTC
wb_weeblr

Hi

The analysis was started before i added this "/events/date/{*}" under the page settings. It was not clear to me that I should then reset everything.

Yes, the analysis should be reset if you exclude some pages because when you add the exclusion, many such pages (/events/date/[redacted]) might already be in the list of pages to analyze. Your pages exclusion rule will not exclude the pages already found, it will only work for the newly discovered.

I will reset the analysis and start from a fresh state.

I have setup a cronjob for very 2 minutes. I assumed that this is how it works.

An actual cron is the "backup" plan. 4SEO does not actually need it in most cases. The main "cron" we use is a fake cron created by adding pixel to your frontend pages. So if a visitor visits a page on the frontend, after a few seconds, this triggers a 4SEO run where 4SEO will analyze a page or 2.

Then you can set up a real cron, as you did, which we are sure will run every N minutes. This is secure but usually slower because the cron does not run very often. It's needed if, for instance, your site has little or no traffic and there's no visitors on the frontend to trigger page analysis.

The last way of doing an analysis is on the backend, using the "Analyze now" button. There you can start a full analysis and run until it's completed. You have a slider to adjust the crawling speed so if your server can take it, you can go faster.

The 3 analysis methods run in parallel, they do not interfere with each other.

I checked the cronjob URL and i have the feeling that this is not working:

https://[redacted].plus/?_wblapi=/forseo/v1/cron/http&k=faebf33de6

i do not get any result when i manually fire the job, is this normal?

Yes, it's normal, and you actually do get a result. The response code is 204, you can see it if you look at the request from the developer tools:

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Tuesday, 21 September 2021 08:59 UTC
wb_weeblr

Hi

So I just reset the analysis from the Pages | Settings dialog and restarted it manually from the backend for maximum speed.

I will let it run in my browser until it completes or something happens and will let you know how it goes:

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Tuesday, 21 September 2021 12:45 UTC
wb_weeblr

Hi again

Just to update you, the analysis has been running for a while now:

4SEO is going well but it's also exploring links such as the user's profile link that it probably should not:

- this is a waste of resource for us

- you don't want these into your sitemap as you do not want Google to crawl them.

I'd suggest adding /profile/{*} to the list of exclusions and restart the analysis. WHat do you think? are there other links that we should exclude?

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 

 

 
Tuesday, 21 September 2021 13:48 UTC
smirrederfuchs-gmail-com

You right, we can exclude "/profile/{*}" the profile's. I will add it and rerun the analysis.

Otherwise, I can't think of anything at the moment that we can leave out. 

Tuesday, 21 September 2021 13:51 UTC
wb_weeblr

Hi

You right, we can exclude "/profile/{*}" the profile's. I will add it and rerun the analysis.

Otherwise, I can't think of anything at the moment that we can leave out. 

Exactly. Most users will never have this kind of issues but some sites have large numbers of these pages and it's kinda har to avoid these pages automatically. I can - and I will - have plugins that can detect them and avoid storing them as valid pages but I can't prevent analyzing them in the first place (because the URLs are not always going to be the same from site to site).

So a bit of testing at the start and identifying useless URLs is needed in order to crawl and run more efficiently. I hope to explain that and figure out something better, maybe in the initial configuration wizard.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Tuesday, 21 September 2021 14:01 UTC
smirrederfuchs-gmail-com

Indeed, i have to learn what is useful and what can i exlude from pages. 
Currently I still have JSitemap for the generation of the sitemap. my plan is to remove the tool jSitemap after training in your tool 4SEO.

Yes, a startup wizard what is asking about the used compenents to optimize the settings for n00bs like me would great :D

Tuesday, 21 September 2021 14:11 UTC
wb_weeblr

Hi

Yes, a startup wizard what is asking about the used compenents to optimize the settings for n00bs like me would great :D

I don't want to make the first run wizard too complex either. What'd be interesting is not so much the components but more the URLs to be excluded as you specified. So it's kinda hard to ask for this on the very first run but I'll try to think about something...

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Wednesday, 22 September 2021 09:17 UTC
smirrederfuchs-gmail-com

Hi.
One more question about excluding useless pages.

Easysocial generates for the login page lot of this url patterns:
login?return=aHR0cHM6Ly9jbGltYmluZy5wbHVzL2Jsb2cvYWRkLWJsb2c=

so i have added this filter tag to exclude this urls, but this does not work:
/login?{*}

what doing i'm wrong here? 
This are right now my exclude filters:
/events/date/{*}
/profile/{*}
/{*}?page={*}
/{*}?hideRepetition=1
/{*}?includePast=1
/{*}?listview=1
/{*}?listview=2
/{*}?sort=popular
/{*}?sort=alphabetical
/{*}?sort=comments
/{*}?sort=latest
/{*}/withphotos/alphabetical
/{*}/withphotos/lastlogin
/{*}/withphotos/latest
/{*}?sort=likes
/login?{*}

Wednesday, 22 September 2021 09:20 UTC
wb_weeblr

Hi

so i have added this filter tag to exclude this urls, but this does not work:

What does "does not work" means? You reset the analysis and the pages are still analyzed when you redo the analysis?

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Wednesday, 22 September 2021 09:28 UTC
smirrederfuchs-gmail-com

Sorry, it mean's that urls with "login?return=...." pattern after deleting the page-analysis and restarting, they added again to the pages section.

Wednesday, 22 September 2021 09:43 UTC
wb_weeblr

Hi 

OK, I will test this, there may be a bug here. Can you try instead /login{*}

I'll get back to you with my own tests.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Wednesday, 22 September 2021 09:48 UTC
smirrederfuchs-gmail-com

OK, I will test this, there may be a bug here. Can you try instead /login{*}

Yes i did with the same result - the login url's also getting in the page results.

Wednesday, 22 September 2021 09:58 UTC
wb_weeblr

Hi

My testing is not conclusive, looks like it works but your site is specific as it's all EasySocial and so the login process is entirely different.

Could you please provide us with superadmin credentials to your website. You can create a temporary account, and delete it afterward, but it must have superuser level.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 

 
Wednesday, 22 September 2021 10:00 UTC
wb_weeblr

 

I think this is happening because the URL that needs to be tested is after a redirect. 4SEO follows redirect to see "where they go" and so I don't think I apply the exclusion rules at each redirect (there might be multiple redirects in a chain).

I'll check that in the code now but please do provide credentials so I can check the settings and test anyway.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Wednesday, 22 September 2021 10:02 UTC
smirrederfuchs-gmail-com

OK ... please use the login details from the first post. i have opend the login for the super user.

Wednesday, 22 September 2021 10:04 UTC
wb_weeblr

Hi

Thanks for that. Did not ask but I'm correct assuming that other exclusion rules work, it's only the /login/[redacted] that fails to work?

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Wednesday, 22 September 2021 10:08 UTC
smirrederfuchs-gmail-com

So far yes, i have setup the filter's all with testing:
running the page analysing, identifing useless URLs and adding them to the filter. after that i rerun the page analysing and the URLs was gone.
just the the login URLs  making troubles.

Wednesday, 22 September 2021 10:10 UTC
wb_weeblr

Hi again,

Yes, I checked the code and in case of redirect I do not re-apply the exclusions list so the URL can still be collected even if excluded if it's after one or more redirects.

I will correct that problem, install the new version and check it all works today, will let you know when done.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Wednesday, 22 September 2021 15:01 UTC
wb_weeblr

Hi again,

I have installed a newer version that seems to apply exclusion rules also to redirects. In such case, the initial URL is now entirely discarded.

I just started the analysis now and will leave it running until it completes.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Thursday, 23 September 2021 09:16 UTC
wb_weeblr

Hi again,

I have taken a quick look at the results so far and I think you have many pages that should be at least excluded from the sitemap, if not from the analysis.

You have many pages:

- from=listing

- from=user

These should not be analyzed at all.

- /cms/tag/*

These should probably not be analyzed. At least exclude them from your sitemap so that Google has less incentive to crawl them.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Thursday, 23 September 2021 09:27 UTC
wb_weeblr

Hi again,

Another, kinda unrelated, SEO issue on the site. I found quite a few pages that you probably deleted or moved (example: /blog/9er-mit-66-training-laueft). 

But instead of triggering a 404 as they should, there is a redirect to [redacted]/404 and this page has a 200 response code. You should never redirect to a 404. Not sure how you are doing this but either you redirect to a proper, newer or equivalent page, or you should trigger a real 404, without a redirect. That's Joomla normal behavior and that should not be changed.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Friday, 24 September 2021 07:45 UTC
smirrederfuchs-gmail-com

Hi Yannick. 

Thank you for your tips! 

I'm adding constandly more filter tags to the page and performance analysis exclude option.

Another, kinda unrelated, SEO issue on the site. I found quite a few pages that you probably deleted or moved (example: /blog/9er-mit-66-training-laueft). 

How do you found this SEO issues?

 

But instead of triggering a 404 as they should, there is a redirect to [redacted]/404 and this page has a 200 response code. You should never redirect to a 404. Not sure how you are doing this but either you redirect to a proper, newer or equivalent page, or you should trigger a real 404, without a redirect. That's Joomla normal behavior and that should not be changed.

Yes that is because i have setup a joomla article (styled them to my needs) as replacement for the 404 default joomla view.  I did not know that this creates a disadvantage in terms of redirect and response code's.
In our new upcoming page design (based on Yootheme Pro and J4) the error 404 page is so far i have seen fully compatible with the behavior of the real 404 triggering. this would fix the problem what i have right now.

Friday, 24 September 2021 08:48 UTC
wb_weeblr

Hi

How do you found this SEO issues?

These I found by looking at the log files while researching the issue. I noticed I saw some redirects for them to the /404 page which was suspicious (there should not be a redirect to a 404 page) so I checked them.

Within 4SEO, you'll find those under Errors | Broken links, filtering for the Redirect error type. This tells you about links that have one or more redirects before reaching their final destination - which is generally not a good thing but not a huge issue overall. However, this should not happen for a 404, a 404 should be redirected to a replacement page or just trigger a 404 error code.

Yes that is because i have setup a joomla article (styled them to my needs) as replacement for the 404 default joomla view.  I did not know that this creates a disadvantage in terms of redirect and response code's.

I suspect you may have followed a tutorial that stayed on Joomla official documentation pages for a long time (I had it removed actually but it stayed there a long time).

Using an article is not the best solution in any case because A/ it can get indexed and B/ it can show in results when user use the search function on your site.

In our new upcoming page design (based on Yootheme Pro and J4) the error 404 page is so far i have seen fully compatible with the behavior of the real 404 triggering. this would fix the problem what i have right now.

sh404SEF has long been providing a solution for that, by having a specific 404 handler that provides similar links to the one that was requested. For instance, if you try to reach https://weeblr.com/helpdesk/4seo/8149-another-site-analysis-does-not-finishh - notice I wrote finishh with 2 h instead of one, then you'll see the sh404SEF error page:

- fully integrated in your template

- suggesting you the correct link as the first option.

This 404 error handling feature has not been ported to 4SEO yet but will be in one of the next versions.

I always found the Joomla and templates 404 handling lacking as they do not really provide a solution for users. Yes, a search box is nice but there's already a search box on your site so that's not really a good help.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Friday, 24 September 2021 10:10 UTC
smirrederfuchs-gmail-com

This 404 error handling feature has not been ported to 4SEO yet but will be in one of the next versions.

Yannick that are great news to me - I'm even more pleased that I invested in your component!

Friday, 24 September 2021 10:12 UTC
smirrederfuchs-gmail-com

btw. i use right now this method for the 404 custom page redirect:
https://www.joomlashack.com/blog/joomla/custom-404/

Friday, 24 September 2021 11:00 UTC
wb_weeblr

Hi

Yes, this is their second version and the first had the same problem as the Joomla one:

Our first 404 tutorial was helpful to a lot of people. However, some users also wanted the status code of the page to be a 404.

Note that you still have the issues I mentioned above arising from the fact you're using an actual article. But that's still way better than before.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Monday, 25 October 2021 05:34 UTC
system
This ticket has been automatically closed. All tickets which have been inactive for a long time are automatically closed. If you believe that this ticket was closed in error, please contact us.
This ticket is closed, therefore read-only. You can no longer reply to it. If you need to provide more information, please open a new ticket and mention this ticket's number.