• Home
  • Get help
  • Ask a question
Last post 7 hours 30 min ago
Posts last week 94
Average response time last week 34 min
All time posts 67901
All time tickets 10495
All time avg. posts per day 20

Helpdesk is open from Monday through Friday CET

Please create an (free) account to post any question in the support area.
Please check the development versions area. Look at the changelog, maybe your specific problem has been resolved already!
All tickets are private and they cannot be viewed by anyone. We have made public only a few tickets that we found helpful, after removing private information from them.

#9958 – Exclude urls based on regex?

Posted in ‘4SEO’
This is a public ticket. Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.
Friday, 26 May 2023 08:38 UTC
iusab

I have a phpbb forum that is linked through p8pbb component.

4SEO is crawling tens of thousands of pages like

forum/user/resend_act?sid=SESSIONIDHERE

How can i exclude it from crawling specific urls with regex or other way?

Thanks!

Friday, 26 May 2023 09:50 UTC
wb_weeblr

Hi

You don't need regular expression, using the wildcard characters is enough, something like:

forum/user/{*}

or

forum/user/resend{*}

Note that you'll need to "Reset analysis" for these settings to take effect. Existing data won't be removed just by adding the exclusion rules.

4SEO purpose being SEO, personnally I would use the first one which would exclude all "user" related pages that have likely no SEO values.

You can also likely find other pages that should excluded with a few simple rules, for instance  various URLs related to the same post, such as "edit post", "delete post", "bookmark post", "Report post", etc

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 

 
Friday, 26 May 2023 09:51 UTC
wb_weeblr

Hi again,

Relevant documentation on 4SEO analysis exclusions is here.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Friday, 26 May 2023 10:44 UTC
iusab

So basically 4SEO uses same robots.txt as search engines and same syntax?

Do I really need brackets?

 forum/user/{*}

or ok with

forum/user/*

Is there any drawbacks with "reset analysis"? Is it better to just delete entries in database tables manually.

These changes do not affect 4SEF. Correct?

Friday, 26 May 2023 14:02 UTC
wb_weeblr

Hi

So basically 4SEO uses same robots.txt as search engines and same syntax?

Do I really need brackets?

This is computing, you need the exact syntax I used. That syntax is what's used in all parts of 4SEO, for setting any rule (redirects, metadata, structured data, etc). More details and examples in this paragraph of the documentation.

Is there any drawbacks with "reset analysis"? Is it better to just delete entries in database tables manually.

Do NOT delete anything from the database manually. Ever. You'll most certainly kill the application. Data is stored in different tables, with relationships defined between multiple tables. Like most things in Joomla, it's not meant to be managed manually.

There's no issue in Resetting analysis, it's actually a must each time you make significant changes to the site, for instance changing URLs structure, changing robots.txt,etc

These changes do not affect 4SEF. Correct?

4SEF and 4SEO are not related.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 

 
Friday, 26 May 2023 20:43 UTC
iusab

Oh! I now found the settings button for page analysis (thought it was the same as the robots.txt settings page) and of course the 4SEO "Excluded URLs" part uses 4SEO syntax but if I also want to disable for other bots I can use robots.txt settings part and its syntax.

Thanks for your swift responses and help Yannick!

Have a great weekend!

Monday, 29 May 2023 08:21 UTC
wb_weeblr

Hi

Oh! I now found the settings button for page analysis (thought it was the same as the robots.txt settings page) and of course the 4SEO "Excluded URLs" part uses 4SEO syntax but if I also want to disable for other bots I can use robots.txt settings part and its syntax.

Exactly! The syntax in 4SEO is easier and more powerful than in robots.txt, it's much easier to exclude/include specific parts of the site than with robots.txt.

Also, the 2 are (or can be) disconnected. You may want to have some parts of the site indexed by Google but don't have 4SEO spend time on that (because you know you won't be doing any SEO work on these pages).

Closing this ticket now, feel free to open a new one as needed. If you do so, please mention this ticket number in the new one.

-- 

4AI is the new AI-powered assistant for Joomla 3&4. Discover it now!

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 

 
This ticket is closed, therefore read-only. You can no longer reply to it. If you need to provide more information, please open a new ticket and mention this ticket's number.