Skip to content

Pages analysis

The analysis stops immediately, no page is scanned, I am running Joomla on my local machine

This will usually happen if you run your local site under https. On a local webserver, this often implies using a self-signed certificates and this is something PHP will not accept. So each time 4SEO tries to load a page in the background to analyze it, this will fail with a 0 error code.

The fix is to open the Settings dialog on the Pages page, go to the Site analysis tab and make sure Validate TLS certificates is disabled.

After that, you can restart the analysis by:

  • using the Reset analysis button on the same Settings tab
  • Start the analysis with the Analyze now button.

The analysis stops immediately, no page is scanned (Reason #2)

Another reason (see above) for this is if your site is protected with a password. If so, 4SEO cannot load your site pages and cannot analyze them.

4SEO can still work with this if you use the common protection method of setting a username/password through your .htaccess file.

If you set up this username/password using your hosting company control panel, they also normally use the .htaccess method and 4SEO can work with it as well.

The fix is to open the Settings dialog on the Pages page and go to the Site analysis tab. Scroll down to the Restricted access section and enter your site's protection username and password in the corresponding input fields.

After that, you can restart the analysis by:

  • using the Reset analysis button on the same Settings tab
  • Start the analysis with the Analyze now button.

The analysis stops immediately, no page is scanned (Reason #3)

When using a full page caching system, 4SEO is prevented from reading the content of your page (because the page is actually not built by your Joomla site where 4SEO lives but instead directly returned from the cache). Full page caching solutions include:

  • Cloudflare or other CDN when configured to cache the entire page HTML
  • Some 3rd-party Joomla caching plugins
  • Some server-level caching system such as Varnish for instance

If you use one of these, or a similar solution, you should enable 4SEO cache-bypass feature:

  • Open the Settings dialog on the Pages page
  • Go to the Site analysis tab.
  • Scroll down to the External cache bypass (CDN) option and enable it

This has no effect on your site caching

4SEO external cache bypass only applies to 4SEO itself, when it crawls your site. Your site is still cached as expected for your regular visitors or search engines.

Cloudflare

What's described above applies to when configuring Cloudflare to cache all pages content. If you use Cloudflare CDN as 99% of us, that is to cache images, javascript files or CSS files, then this does not affect 4SEO site analysis, and you do not need to do anything.

The analysis stops immediately, no page is scanned (Reason #4)

If you run your site on a Private server or a dedicated server behind a caching layer (Varnish for instance) or any sort of proxy, on rare occasions a misconfiguration of your server DNS can prevent 4SEO to analyze your site.

This happens if your server DNS is unable to resolve your website hostname. A sign this is happening is if it takes a long time (30 seconds usually) for 4SEO to start crawling the 1st page and then it fails without any pages being scanned.

The fix here is that the server DNS configuration must be fixed to allow resolving your website hostname from the server itself.

How long will the analysis take?

As usual, it depends. The most important thing for us is that the analysis process does not slow down your site when pages are displayed to visitors. Analysis is done in the background, trying to find a balance between doing the analysis quickly and not slowing down the site. But not slowing down the site always has the highest priority.

2 things affects analysis duration:

  • how large the site is
  • how much traffic (visitors) are visiting your site

See a more detailed explanation on how background analysis works on the Pages documentation.

On large sites, with many tens of thousands pages or more, the analysis will take longer. If a site has a lot of visitors, analysis will be much quicker. If you have only a few visitors a day, then the analysis will be much slower.

Full site analysis can take anywhere from a few seconds to several days.

Here are few things to consider:

  • You can run the analysis manually from 4SEO control panel, on the Pages page, using the Analyze now toolbar button.
  • You may instruct 4SEO to skip some parts of your site if they do not have any SEO value. For instance a forum, which has many pages but few with actual content. Use the Excluded pages option in the Settings of the Pages page for that.
  • Many 4SEO features will work without the analysis being complete: Social networks sharing, structured data generation, redirections, content replacement,...

Sitemap generation

4SEO generates a single sitemap at the address https://<yoursite>/sitemap-4seo.xml

Why do I get a "Service unavailable" error trying to load the sitemap?

A sitemap should only be generated when a site has been fully analyzed because that's the only way to find out which pages should be in the sitemap and which one should be discarded (duplicates typically).

Until the site analysis has been completed, if you or Google request the sitemap, 4SEO will respond with the 503 Service unavailable message, which is what search engines expect when a page is being worked on and not ready at the moment.

The 503 response code tells vistor: this page is not ready now, but it will be soon so just come back later.

This response is normal and expected by search engines.

Why is there no page listed in the sitemap file itself?

If you open the file https://<yoursite>/sitemap-4seo.xml, you won't see a list of pages addresses as you may be used to. 4SEO uses an index sitemap file format: the sitemap-4seo.xml file does not contain all URLs directly but instead a list of other, partial sitemaps.

This allows handling very large websites, with hundreds of thousands of pages without any issue. It works just fine as well for much smaller sites, with only a few pages.

4SEO will only put up to 1000 pages in each partial sitemap file. This makes each file small, fast and easy to manipulate and update.

Search engines such as Google or Bing will read first the index sitemap file at https://<yoursite>/sitemap-4seo.xml and then each partial sitemap they found listed there.

You can follow the progress of search engines reading your sitemap on the Sitemaps pages in 4SEO admin panel.

On multilingual sites, pages in each language will be put in separate partial sitemaps.

Trying to read sitemap triggers a 404 error

4SEO sitemap, located at https://yoursite.com/sitemap-4seo.xml, is not a real file, sitting on your web server disk. Instead, it's rendered by Joomla just like any other page of your website.

However, some 3rd-party extensions, or your server .htaccess file, can sometimes block .xml or .txt files. If that happens, a 404 error page generated by your webserver will be displayed when trying to access your sitemap.

Make sure to configure your .htaccess file correctly to leave access to .xml files. Likewise, configure 3rd-party extension such as Admin Tools to allow 4SEO sitemap files to be read by search engines normally.