Startpage Engines¶
Startpage’s language & region selectors are a mess ..
Startpage regions¶
In the list of regions there are tags we need to map to common region tags:
pt-BR_BR --> pt_BR
zh-CN_CN --> zh_Hans_CN
zh-TW_TW --> zh_Hant_TW
zh-TW_HK --> zh_Hant_HK
en-GB_GB --> en_GB
and there is at least one tag with a three letter language tag (ISO 639-2):
fil_PH --> fil_PH
The locale code no_NO
from Startpage does not exists and is mapped to
nb-NO
:
babel.core.UnknownLocaleError: unknown locale 'no_NO'
For reference see languages-subtag at iana; no
is the macrolanguage [1] and
W3C recommends subtag over macrolanguage [2].
Startpage languages¶
send_accept_language_header
:The displayed name in Startpage’s settings page depend on the location of the IP when
Accept-Language
HTTP header is unset. Infetch_traits
we use:'Accept-Language': "en-US,en;q=0.5", ..
to get uniform names independent from the IP).
Startpage categories¶
Startpage’s category (for Web-search, News, Videos, ..) is set by
startpage_categ
in settings.yml:
- name: startpage
engine: startpage
startpage_categ: web
...
Hint
The default category is web
.. and other categories than web
are not
yet implemented.
- searx.engines.startpage.fetch_traits(engine_traits: EngineTraits)[source]¶
- searx.engines.startpage.get_sc_code(searxng_locale, params)[source]¶
Get an actual
sc
argument from Startpage’s search form (HTML page).Startpage puts a
sc
argument on every HTMLsearch form
. Without this argument Startpage considers the request is from a bot. We do not know what is encoded in the value of thesc
argument, but it seems to be a kind of a time-stamp.Startpage’s search form generates a new sc-code on each request. This function scrap a new sc-code from Startpage’s home page every
sc_code_cache_sec
seconds.
- searx.engines.startpage.request(query, params)[source]¶
Assemble a Startpage request.
To avoid CAPTCHA we need to send a well formed HTTP POST request with a cookie. We need to form a request that is identical to the request build by Startpage’s search form:
in the cookie the region is selected
in the HTTP POST data the language is selected
Additionally the arguments form Startpage’s search form needs to be set in HTML POST data / compare
<input>
elements:search_form_xpath
.
- searx.engines.startpage.max_page = 18¶
Tested 18 pages maximum (argument
page
), to be save max is set to 20.
- searx.engines.startpage.sc_code_cache_sec = 30¶
Time in seconds the sc-code is cached in memory
get_sc_code
.
- searx.engines.startpage.search_form_xpath = '//form[@id="search"]'¶
XPath of Startpage’s origin search form
- searx.engines.startpage.send_accept_language_header = True¶
Startpage tries to guess user’s language and territory from the HTTP
Accept-Language
. Optional the user can select a search-language (can be different to the UI language) and a region filter.
- searx.engines.startpage.startpage_categ = 'web'¶
Startpage’s category, visit Startpage categories.