- Sort Score
- Result 10 results
- Languages All
Results 1 - 5 of 5 for patternset (0.16 sec)
-
fess-crawler/src/test/java/org/codelibs/fess/crawler/filter/UrlFilterTest.java
assertFalse(urlFilter.match("https://other.com/page.html")); } /** * Test match with no patterns configured */ public void test_match_noPatterns() { String sessionId = "test-session-009"; urlFilter.init(sessionId); // Without any patterns, all URLs should match assertTrue(urlFilter.match("https://example.com/"));
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Wed Sep 03 14:42:53 UTC 2025 - 19K bytes - Viewed (0) -
README.md
crawler.crawlerContext.setDefaultIntervalTime(1000); // 1 second ``` ### URL Filtering ```java // Include patterns crawler.urlFilter.addInclude("https://example.com/.*"); crawler.urlFilter.addInclude(".*\\.pdf$"); // Exclude patterns crawler.urlFilter.addExclude(".*\\.js$"); crawler.urlFilter.addExclude(".*login.*"); ``` ## Supported Protocols and Formats
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Aug 31 05:32:52 UTC 2025 - 15.3K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/entity/RobotsTxt.java
* the most specific (longest) match is used.</p> * */ public class RobotsTxt { private static final String ALL_BOTS = "*"; /** Map of user agent patterns to their corresponding directives. */ protected final Map<Pattern, Directive> directiveMap = new LinkedHashMap<>(); /** List of sitemap URLs found in the robots.txt file. */
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 10K bytes - Viewed (0) -
fess-crawler/src/test/java/org/codelibs/fess/crawler/rule/impl/AbstractRuleTest.java
ConditionalAbstractRule conditionalRule = new ConditionalAbstractRule(); conditionalRule.crawlerContainer = container; conditionalRule.setRuleId("conditionalRule"); // Set patterns conditionalRule.setUrlPattern("https?://.*\\.example\\.com/.*"); conditionalRule.setMimeTypePattern("text/.*"); // Test matching ResponseData responseData1 = new ResponseData();
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Wed Sep 03 14:42:53 UTC 2025 - 21.9K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/container/StandardCrawlerContainer.java
/** * A container implementation that manages the lifecycle and dependency injection of components * in a crawler application. This container supports both singleton and prototype component * instantiation patterns. * * <p>The container provides mechanisms for: * <ul> * <li>Registering and retrieving components by name</li> * <li>Managing singleton instances with lifecycle hooks</li>
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 14.3K bytes - Viewed (0)