hello
To improve SEO, I put in the keyword menu link with the tags component => noindex, follow
I rebuilt the sitemap
I resubmitted the sitemap
but I still see the keyword in the sitemap
How to prevent Joomla tag pages from being indexed?
Regards
Helpdesk is open from Monday through Friday CET
#10839 – no consideration of modification of menu links in the sitemap
Hi
I rebuilt the sitemap
I resubmitted the sitemap
but I still see the keyword in the sitemap
That will not change the content of the sitemap. Rebuilding the sitemap rebuilds it according to the current list of pages.
What you need to do is a Site analysis (but please read below, you probably don't want to update your sitemap right now). The list of pages is what decides what goes into the sitemap.
1 - Go to Pages | Settings | Site analysis
2 - Click Reset analysis
3 - Back to the Pages page, use the Analyze now button in the toolbar.
Let the analysis run. Once complete, a new sitemap will be created and submitted to Google (if you connected 4SEO to your Search Console account).
Note 1: 4SEO would have updated the sitemap by itself, without doing a full manual analysis, but it will do that more slowly, so that's one of the cases where you want to do a reset and a manual analysis.
Note 2: you probably do not want to update your sitemap immediately. See below.
How to prevent Joomla tag pages from being indexed?
By adding noindex tag as you did. Then Google will take some times to take that into account and remove pages (one by one) from their index. They'll do it only after crawling again the page (very much like 4SEO must re-analyze the page to know that it has a noindex tag.
This is why you generally do not want to update your sitemap immediately: if you remove those pages from your sitemap immediately, then you tell Google these pages are not important. Because you tell Google they are not important, they likely will not come back and crawl the page soon. And so the pages will stay in the index.
By leaving the sitemap untouched for a few days or weeks (depending on that site traffic and size), you increase your chances that Google will come and update their index more quickly.
Best regards
Yannick Gaultier
weeblr.com / @weeblr
Hi again,
Another note, just in case: do NOT exclude these pages in your robots.txt!
Exact same reason: if you exclude these pages in your robots.txt, Google will not crawl them. Therefore, they won't know that you added a noindex tag and the pages will stay in the index.
For the noindex tag to have an effect on a page, Google must crawl and analyze the page. If you block them from crawling, they can't read the page content and see the noindex.
Last one: you've used noindex, follow, but in this case, the "follow" has no effect. After a few weeks or months, if the pages are still "noindex", Google also consider it "nofollow" (because they can't trust your links if they can't trust your content).
Best regards
Yannick Gaultier
weeblr.com / @weeblr
thank you for all this advice.
Ok I'll wait a few days to a few weeks. I don't want to send bad signals to Google
component tags
Wow, it's a bit complicated to understand because these are Joomla pages which have internal links to follow (good for SEO).
So what should I do?
1/ when the site starts, I have few links in a tag page
=> noindex, nofollow
2/ when the site has enough links for most of the tag pages :
=> index, follow
robots.txt
And in all cases, if I understand correctly I have to remove robots.txt from the file
Disallow: /keyword/
Hi
First off, understand that all of this is not part of support. These discussions are better had with a SEO consultant. I provide support on how to use 4SEO to achieve your SEO strategy, not on establishing this strategy.
That said:
Wow, it's a bit complicated to understand because these are Joomla pages which have internal links to follow (good for SEO).
Maybe but that's not Google concern. Their position is simple: if you have links on a page that we don't index, we can't trust these links. So we're not going to follow them. If a page has noindex, follow for a long time (ie: it's not just temporary or a mistake) then we act as if it was noindex, nofollow. We just stop considering the page entirely.
So what should I do?
1/ when the site starts, I have few links in a tag page
=> noindex, nofollow
2/ when the site has enough links for most of the tag pages :
=> index, follow
That's your choice but what I don't understand is why you have these tag links. All these pages should be accessed through normal menus and navigation through categories? what do the tags links add to this?
robots.txt
And in all cases, if I understand correctly I have to remove robots.txt from the file
Disallow: /keyword/
If you want to the noindex to be taken into account, then Google has to crawl the page. If your tags pages start with /keyword/ then this line in robots.txt prevents Google from crawling the page and they cannot see that you added a noindex to these pages. So yes, if you want to noindex these pages, you must not block them in robots.txt.
Best regards
Yannick Gaultier
weeblr.com / @weeblr
Thank you for all your advice.
A good link is better than a long speech :
https://example.eu/mot-cle/tableur
A keyword page in Joomla is essentially urls. Good for internal mapping but meaningless. Hence people who tell me to put noIndex but allow google to visit links. What's not easy is that I have contradictory answers at the moment. (notably by putting: Disallow: /keyword/)but ok I'm going to take it off) !
Yes, I also ask my questions on the webrankinfo forum (among others) or by looking on the web (an association cannot afford to pay a consultant)
I wonder if you extension 4AI could do some optimization advice in the future on these kinds of questions?
Hi
Good for internal mapping but meaningless
A tag page is not good for internal linking. At all. Because the tag page itself has no value. It has no content, so it has exactly zero value and does not pass any value onto the linked pages.
A keyword page in Joomla is essentially urls.
Exactly. The page itself has no content, no value. Therefore in terms of SEO, it's not useful.
allow google to visit links.
Why would that be interesting in any way? you do not have any other links to these pages? through menus and categories?
There's usually no need to have tags, unless you have a navigation structure problem. Your page should be reachable without tags, through regular navigation.
And if you have a proper navigation (menus and categories), then:
- you do not need tag pages (because Google will find the pages through the normal links)
- you should not have tag pages (because they destroy the structure of your site, adding another set of links)
What's not easy is that I have contradictory answers at the moment. (notably by putting: Disallow: /keyword/)
If you have /keyword/ in your robots.txt, Google will not crawl your page and therefore won't see your noindex. It's SEO 101.
It's actually even in Google documentation:
Yes, I also ask my questions on the webrankinfo forum (among others) or by looking on the web (an association cannot afford to pay a consultant)
That's the problem with forums though: anyone can reply and you don't know the quality of that reply.
I wonder if you extension 4AI could do some optimization advice in the future on these kinds of questions?
No more than ChatGPT. 4AI is about helping you create better content. It won't advise you on robots.txt or noindex. Well, you can chat with ChatGPT as part of the user interface, which may be more convenient than going to the ChatGPT website, but you'll get the same response as ChatGPT - with the ability to use GPT4.
Best regards
Yannick Gaultier
weeblr.com / @weeblr
Hi
Thank you for all your advice.
Ah, as someone who knows better than me told me about your suggestion not to put Disallow: /keyword/
Him
the pages are indeed ALREADY indexed, but this is not the case. And since we want it not to index them and not consider them as potential, we prohibit crawling.
because these urls with "" in the page indexing console > Crawled, currently not indexed1
Ok I'm going to look at the 4AI presentation again because I thought it suggested the title, description tags...
Regards
Hi
Ah, as someone who knows better than me told me about your suggestion not to put Disallow: /keyword/
It's not a suggestion. It's exactly what the Google documentation says. In red and in bold.
because these urls with "" in the page indexing console > Crawled, currently not indexed1
This means that today Google has decided not to index that page. Tomorrow they can decide to do it (for instance, if they find somebody linking to that tag page).
The only way to prevent that is to add a noindex. And the only way for Google to know if that page has a noindex tag is if they re-crawl it. And they won't recrawl it if you block crawling.
And since we want it not to index them and not consider them as potential, we prohibit crawling.
And that's still completely wrong, whoever says it. Maybe you can ask your "SEO specialist" how is google supposed to know you want that page noindexed if they are not allowed to crawl it?
I'm waiting for the answer from that person, please post it here.
Best regards
Yannick Gaultier
weeblr.com / @weeblr
Hi again
If it helps, here the link and the screenshot to the same Google documentation page in French.
Best regards
Yannick Gaultier
weeblr.com / @weeblr
HI
Thank you for this reference
I had read it well but (as I am in panic mode, due to the lack of improvement in SEO despite actions including the purchase of 4SEO), his remark made me think that it was a different case and that we could add this exception like the basic ones:
Disallow: /administrator/
Disallow: /api/
Disallow: /bin/
#4SEO-opt Disallow: /cache/
Disallow: /cli/
#4SEO-opt Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
#4SEO-opt Disallow: /modules/
#4SEO-opt Disallow: /plugins/
Disallow: /tmp/
Disallow: /en/
Hi
as I am in panic mode, due to the lack of improvement in SEO despite actions including the purchase of 4SEO
Getting 4SEO ensures that :
- you have the technical basics right
- you can use 4SEO tools to do SEO operations that you decide on applying (noindexing some areas, doing redirects, etc)
- have OpenGraph and TwitterCards so that people can easily share your content on social network
- possibly add structured data to your content
But there's no magic and unless you have something terribly wrong and Google can't access your site or something, you can't expect a change, especially a quick one, by adding noindex to your tag pages.
For instance, as you pointed out yourself, these tag pages are already not indexed by google ("Crawled, currently not indexed1") so adding "noindex" tag is good and should be done, but that's not going to change anything.
As usual in SEO, there are 2 things you can act upton:
- content
- links (and promotion in general)
You have not shared the site URL so I can't comment on that (and that's not my role) but in my experience of the last year or so:
- Google tends to prefer more larger site, with established brands. Smaller site must have lots and excellent content to perform well
- links are just important as before, and having links pointing at your site matters a lot
Most smaller sites I know have been hit a lot by the "Helpful Content Updates" of the last 18 months. So even more than before, more generic content does not perform well.
his remark made me think that it was a different case
If you are referring to the remark about "since we want it not to index them and not consider them as potential, we prohibit crawling", sorry to be blunt but that person does not appear to know what they are talking about.
You can play with your site robots.txt for weeks. That won't change anything.
- Blocking access in robots.txt is not the same as noindexing. Even if you block a page access in robots.txt, Google will still index it (if it finds a link to it on another website)
- unless your site has several thousands of pages, all this noindexing is of no use. Google already crawled your site and they decided to not indexed some pages already.
Best regards
Yannick Gaultier
weeblr.com / @weeblr
Best regards
Yannick Gaultier
weeblr.com / @weeblr
HI
Ok I trust you with your arguments and sources. the site has already been mentioned https://example.eu/ but I understand that you do not do an SEO study.
Hi
So first the sitemap lists about 280 pages, which means it's not large enough to have crawling issues. So I very strongly doubt anything you do with noindexing content (which you should do anyway) will have a quick or real effect on Google not indexing some pages.
In short, your issue has more to do with Google not "liking" your content rather than something technical or otherwise which would prevent them from indexing it.
I just had a quick look and I do find that many pages are a little bit low on content, for instance:
- https://example.eu/logiciel-libre/clam-antivirus-un-antivirus-gpl
- https://example.eu/logiciel-libre/thunderbird-un-logiciel-pour-rendre-votre-messagerie-plus-facile
- https://example.eu/logiciel-libre/kodi-media-center-pour-gerer-toutes-vos-videos-musique
All these are more like a datasheet with one or 2 small paragraphs of text and then an item list of characteristics.
I also saw a couple of particles where most of the article is an embedded PDF, with only a couple of lines of introduction. So that's in fact very little usable, easily accessible content.
But that's really where a deeper analysis, by a qualified SEO specialist, would be needed to understand the problem, or how things can be improved. And also remember that a lot of "SEO" is not about your site but about sharing content, making the content known to others, getting backlinks,...
Best regards
Yannick Gaultier
weeblr.com / @weeblr
HI
Ok Thanks
Hi
You're welcome! Closing this ticket now, feel free to open a new one as needed. If you do so, please mention this ticket number in the new one.
If you created any superadmin account for us, be sure to delete or block it now to avoid unnecessary risk in the future.
--
4AI is the new AI-powered assistant for Joomla 3,4&5. Discover it now!
Best regards
Yannick Gaultier
weeblr.com / @weeblr