Search Options

Results per page
Sort
Preferred Languages
Advance

Results 1 - 5 of 5 for patternset (0.16 sec)

  1. fess-crawler/src/test/java/org/codelibs/fess/crawler/filter/UrlFilterTest.java

            assertFalse(urlFilter.match("https://other.com/page.html"));
        }
    
        /**
         * Test match with no patterns configured
         */
        public void test_match_noPatterns() {
            String sessionId = "test-session-009";
            urlFilter.init(sessionId);
    
            // Without any patterns, all URLs should match
            assertTrue(urlFilter.match("https://example.com/"));
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Wed Sep 03 14:42:53 UTC 2025
    - 19K bytes
    - Viewed (0)
  2. README.md

    crawler.crawlerContext.setDefaultIntervalTime(1000); // 1 second
    ```
    
    ### URL Filtering
    
    ```java
    // Include patterns
    crawler.urlFilter.addInclude("https://example.com/.*");
    crawler.urlFilter.addInclude(".*\\.pdf$");
    
    // Exclude patterns  
    crawler.urlFilter.addExclude(".*\\.js$");
    crawler.urlFilter.addExclude(".*login.*");
    ```
    
    ## Supported Protocols and Formats
    
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Aug 31 05:32:52 UTC 2025
    - 15.3K bytes
    - Viewed (0)
  3. fess-crawler/src/main/java/org/codelibs/fess/crawler/entity/RobotsTxt.java

     * the most specific (longest) match is used.</p>
     *
     */
    public class RobotsTxt {
        private static final String ALL_BOTS = "*";
    
        /** Map of user agent patterns to their corresponding directives. */
        protected final Map<Pattern, Directive> directiveMap = new LinkedHashMap<>();
    
        /** List of sitemap URLs found in the robots.txt file. */
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 10K bytes
    - Viewed (0)
  4. fess-crawler/src/test/java/org/codelibs/fess/crawler/rule/impl/AbstractRuleTest.java

            ConditionalAbstractRule conditionalRule = new ConditionalAbstractRule();
            conditionalRule.crawlerContainer = container;
            conditionalRule.setRuleId("conditionalRule");
    
            // Set patterns
            conditionalRule.setUrlPattern("https?://.*\\.example\\.com/.*");
            conditionalRule.setMimeTypePattern("text/.*");
    
            // Test matching
            ResponseData responseData1 = new ResponseData();
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Wed Sep 03 14:42:53 UTC 2025
    - 21.9K bytes
    - Viewed (0)
  5. fess-crawler/src/main/java/org/codelibs/fess/crawler/container/StandardCrawlerContainer.java

    /**
     * A container implementation that manages the lifecycle and dependency injection of components
     * in a crawler application. This container supports both singleton and prototype component
     * instantiation patterns.
     *
     * <p>The container provides mechanisms for:
     * <ul>
     *   <li>Registering and retrieving components by name</li>
     *   <li>Managing singleton instances with lifecycle hooks</li>
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 14.3K bytes
    - Viewed (0)
Back to top