Search Options

Results per page
Sort
Preferred Languages
Advance

Results 1 - 10 of 15 for patternset (0.04 sec)

  1. fess-crawler/src/main/java/org/codelibs/fess/crawler/service/impl/UrlFilterServiceImpl.java

     * This class provides methods for managing URL filtering rules,
     * including adding include and exclude URL patterns, deleting patterns,
     * and retrieving lists of compiled URL patterns. It utilizes a
     * {@link MemoryDataHelper} to store and manage the URL patterns in memory.
     *
     */
    public class UrlFilterServiceImpl implements UrlFilterService {
    
        /**
         * Creates a new UrlFilterServiceImpl instance.
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 4.2K bytes
    - Viewed (0)
  2. fess-crawler/src/test/java/org/codelibs/fess/crawler/filter/UrlFilterTest.java

            assertFalse(urlFilter.match("https://other.com/page.html"));
        }
    
        /**
         * Test match with no patterns configured
         */
        public void test_match_noPatterns() {
            String sessionId = "test-session-009";
            urlFilter.init(sessionId);
    
            // Without any patterns, all URLs should match
            assertTrue(urlFilter.match("https://example.com/"));
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Wed Sep 03 14:42:53 UTC 2025
    - 19K bytes
    - Viewed (0)
  3. fess-crawler/src/main/java/org/codelibs/fess/crawler/client/CrawlerClientFactory.java

    /**
     * A factory class for managing and creating crawler clients based on URL patterns.
     * This class implements AutoCloseable to properly handle resource cleanup.
     *
     * <p>The factory maintains a map of regular expression patterns to crawler clients,
     * allowing for URL-based client selection. Clients can be added with specific patterns
     * and optionally at specific positions in the processing order.</p>
     *
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 7K bytes
    - Viewed (0)
  4. fess-crawler/src/main/java/org/codelibs/fess/crawler/service/UrlFilterService.java

        /**
         * Retrieves a list of URL patterns to include for a given session.
         *
         * @param sessionId the ID of the session for which to retrieve the include URL patterns
         * @return a list of compiled regular expression patterns representing the URLs to include
         */
        List<Pattern> getIncludeUrlPatternList(String sessionId);
    
        /**
         * Retrieves a list of URL patterns to be excluded for a given session.
         *
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sat Mar 15 06:52:00 UTC 2025
    - 3.1K bytes
    - Viewed (0)
  5. fess-crawler-opensearch/src/main/java/org/codelibs/fess/crawler/service/impl/OpenSearchUrlFilterService.java

            excludeFilterCache.invalidate(sessionId);
        }
    
        /**
         * Gets the list of include URL patterns for the specified session.
         *
         * @param sessionId The session ID.
         * @return The list of compiled include patterns.
         * @throws CrawlerSystemException if the patterns cannot be loaded.
         */
        @Override
        public List<Pattern> getIncludeUrlPatternList(final String sessionId) {
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Thu Aug 07 02:55:08 UTC 2025
    - 9.2K bytes
    - Viewed (0)
  6. fess-crawler/src/main/java/org/codelibs/fess/crawler/helper/MemoryDataHelper.java

        protected volatile Map<String, Map<String, AccessResultImpl<Long>>> sessionMap = new HashMap<>();
    
        /** Map of session IDs to include URL patterns for filtering URLs. */
        protected volatile Map<String, List<Pattern>> includeUrlPatternMap = new HashMap<>();
    
        /** Map of session IDs to exclude URL patterns for filtering URLs. */
        protected volatile Map<String, List<Pattern>> excludeUrlPatternMap = new HashMap<>();
    
        /**
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 8.1K bytes
    - Viewed (0)
  7. fess-crawler/src/main/java/org/codelibs/fess/crawler/filter/impl/UrlFilterImpl.java

     * It uses a {@link UrlFilterService} to manage the URL filtering rules.
     * The class supports caching of include and exclude patterns for scenarios where a session ID is not available.
     * It also provides methods to initialize the filter with a session ID, clear the filter,
     * match a URL against the defined patterns, and process a URL to add include or exclude patterns based on predefined filtering patterns.
     *
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 9.2K bytes
    - Viewed (0)
  8. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PasswordBasedExtractor.java

     *
     * <p>The extractor supports two types of password management:
     * <ul>
     *   <li>Static passwords configured via {@link #addPassword(String, String)}</li>
     *   <li>Dynamic passwords provided through extraction parameters</li>
     * </ul>
     *
     * <p>Passwords are matched against URLs or resource names using regular expression patterns.
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Thu Aug 07 02:55:08 UTC 2025
    - 5.1K bytes
    - Viewed (0)
  9. fess-crawler/src/main/java/org/codelibs/fess/crawler/rule/impl/RegexRule.java

         * @return true if all patterns must match, false if only one needs to match
         */
        public boolean isAllRequired() {
            return allRequired;
        }
    
        /**
         * Sets whether all regular expressions must match for this rule to match.
         * @param allRequired true if all patterns must match, false if only one needs to match
         */
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 6.2K bytes
    - Viewed (0)
  10. README.md

    crawler.crawlerContext.setDefaultIntervalTime(1000); // 1 second
    ```
    
    ### URL Filtering
    
    ```java
    // Include patterns
    crawler.urlFilter.addInclude("https://example.com/.*");
    crawler.urlFilter.addInclude(".*\\.pdf$");
    
    // Exclude patterns  
    crawler.urlFilter.addExclude(".*\\.js$");
    crawler.urlFilter.addExclude(".*login.*");
    ```
    
    ## Supported Protocols and Formats
    
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Aug 31 05:32:52 UTC 2025
    - 15.3K bytes
    - Viewed (0)
Back to top