- Sort Score
- Result 10 results
- Languages All
Results 11 - 20 of 134 for sitemap (0.1 sec)
-
fess-crawler/src/main/java/org/codelibs/fess/crawler/helper/SitemapsHelper.java
/** * Helper class for parsing and validating sitemaps. * It supports XML sitemaps, XML sitemap indexes, and text sitemaps, * and can handle GZIP compressed sitemaps. * The class provides methods to check if an input stream is a valid sitemap, * and to parse an input stream into a {@link SitemapSet} object. * It uses SAX parser for XML sitemaps and XML sitemap indexes, * and handles potential exceptions during parsing.
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 14.7K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/rule/impl/SitemapsRule.java
* represents a valid sitemap. It uses a SitemapsHelper to validate the response body as an InputStream. * The rule checks if the URL matches the defined regex pattern and then validates the content as a sitemap. * If any exception occurs during the sitemap validation, it logs the error and returns false. * */ public class SitemapsRule extends RegexRule { /**
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 2.6K bytes - Viewed (0) -
src/main/resources/crawler/rule.xml
<component class="org.codelibs.fess.crawler.processor.impl.SitemapsResponseProcessor"> </component> </property> <postConstruct name="addRule"> <arg>"url"</arg> <arg>"http[s]?:.*sitemap[^/]*\.xml.*|http[s]?:.*sitemap[^/]*\.gz.*|http[s]?:.*sitemap[^/]*\.txt.*"</arg> </postConstruct> </component> <component name="webHtmlRule" class="org.codelibs.fess.crawler.rule.impl.RegexRule" > <property name="ruleId">"webHtmlRule"</property>
Registered: Thu Sep 04 12:52:25 UTC 2025 - Last Modified: Thu Jun 04 08:42:49 UTC 2020 - 4.6K bytes - Viewed (0) -
fess-crawler/src/test/java/org/codelibs/fess/crawler/rule/impl/RuleManagerImplTest.java
} public void test_getRule_sitemaps5() { final ResponseData responseData = new ResponseData(); responseData.setUrl("http://www.example.com/sitemap/"); File file = ResourceUtil.getResourceAsFile("sitemaps/sitemap1.xml"); responseData.setResponseBody(file, false); final Rule rule = ruleManager.getRule(responseData); assertNotNull(rule);
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sat Mar 15 06:52:00 UTC 2025 - 6.2K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/CrawlerContext.java
} /** * Adds sitemaps to the thread-local storage. * @param sitemaps An array of sitemap URLs. */ public void addSitemaps(final String[] sitemaps) { sitemapsLocal.set(sitemaps); } /** * Removes sitemaps from the thread-local storage and returns them. * @return An array of sitemap URLs, or null if none were present. */
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 8.9K bytes - Viewed (0) -
fess-crawler/src/test/resources/org/codelibs/fess/crawler/helper/robots.txt
User-agent: Crawler Disallow: /aaa User-agent: Crawler/1.0 Disallow: /bbb User-agent: Crawler/2.0 Disallow: /ccc User-agent: Hoge Crawler Disallow: /ddd sitemap: http://www.example.com/sitmap.xml
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Oct 11 02:16:55 UTC 2015 - 566 bytes - Viewed (0) -
fess-crawler/src/test/resources/sitemaps/sitemap1.xml
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2005-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> <url> <loc>http://www.example.com/catalog?item=12&desc=vacation_hawaii</loc> <changefreq>weekly</changefreq> </url> <url>
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Oct 11 02:16:55 UTC 2015 - 915 bytes - Viewed (0) -
README.md
controller.setDefaultIntervalTime(1000); }); ``` ### Sitemap Support ```java // Enable sitemap processing container.singleton("sitemapsRule", SitemapsRule.class, rule -> { rule.addRule("url", ".*sitemap.*"); }); // Add sitemap URL crawler.addUrl("https://example.com/sitemap.xml"); ``` ## Data Access and Storage ### Accessing Crawled Data ```java
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Aug 31 05:32:52 UTC 2025 - 15.3K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/helper/RobotsTxtHelper.java
protected static final Pattern CRAWL_DELAY_RECORD = Pattern.compile("^crawl-delay:\\s*([^\\s]+)\\s*$", Pattern.CASE_INSENSITIVE); /** * Pattern for Sitemap record. */ protected static final Pattern SITEMAP_RECORD = Pattern.compile("^sitemap:\\s*([^\\s]+)\\s*$", Pattern.CASE_INSENSITIVE); /** Whether robots.txt processing is enabled. */ protected boolean enabled = true; /**
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 7.7K bytes - Viewed (0) -
fess-crawler-lasta/src/main/resources/crawler/rule.xml
<component class="org.codelibs.fess.crawler.processor.impl.SitemapsResponseProcessor"> </component> </property> <postConstruct name="addRule"> <arg>"url"</arg> <arg>".*sitemap.*"</arg> </postConstruct> </component> <component name="fileRule" class="org.codelibs.fess.crawler.rule.impl.RegexRule"> <property name="ruleId">"fileRule"</property> <property name="defaultRule">true</property>
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Oct 11 02:16:55 UTC 2015 - 1.5K bytes - Viewed (0)