Search Options

Results per page
Sort
Preferred Languages
Advance

Results 11 - 19 of 19 for urlFilters (0.06 sec)

  1. fess-crawler/src/test/java/org/codelibs/fess/crawler/client/http/HcHttpClientTest.java

                    .singleton("urlFilter", UrlFilterImpl.class)//
                    .singleton("robotsTxtHelper", RobotsTxtHelper.class)//
                    .singleton("httpClient", HcHttpClient.class);
            httpClient = container.getComponent("httpClient");
            urlFilter = container.getComponent("urlFilter");
        }
    
        public void test_doGet() {
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sat Sep 06 04:15:37 UTC 2025
    - 11.7K bytes
    - Viewed (0)
  2. fess-crawler-opensearch/src/test/java/org/codelibs/fess/crawler/CrawlerTest.java

                crawler.addUrl(url);
                crawler.getCrawlerContext().setMaxAccessCount(maxCount);
                crawler.getCrawlerContext().setNumOfThread(numOfThread);
                crawler.urlFilter.addInclude(url + ".*");
                final String sessionId = crawler.execute();
                assertEquals(maxCount, dataService.getCount(sessionId));
                dataService.delete(sessionId);
            } finally {
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sat Sep 06 04:15:37 UTC 2025
    - 7.7K bytes
    - Viewed (0)
  3. README.md

    crawler.crawlerContext.setDefaultIntervalTime(1000); // 1 second
    ```
    
    ### URL Filtering
    
    ```java
    // Include patterns
    crawler.urlFilter.addInclude("https://example.com/.*");
    crawler.urlFilter.addInclude(".*\\.pdf$");
    
    // Exclude patterns  
    crawler.urlFilter.addExclude(".*\\.js$");
    crawler.urlFilter.addExclude(".*login.*");
    ```
    
    ## Supported Protocols and Formats
    
    ### Protocols
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Aug 31 05:32:52 UTC 2025
    - 15.3K bytes
    - Viewed (0)
  4. src/main/java/org/codelibs/fess/helper/WebFsIndexHelper.java

                    try {
                        urlFilterService.delete(sid);
                    } catch (final Exception e) {
                        logger.warn("Failed to delete UrlFilter: sessionId={}", sid);
                    }
                }
    
                final DuplicateHostHelper duplicateHostHelper = ComponentUtil.getDuplicateHostHelper();
    
                // set urls
    Registered: Sat Dec 20 09:19:18 UTC 2025
    - Last Modified: Fri Nov 28 16:29:12 UTC 2025
    - 25K bytes
    - Viewed (0)
  5. fess-crawler/src/test/java/org/codelibs/fess/crawler/CrawlerContextTest.java

        protected void setUp() throws Exception {
            super.setUp();
            crawlerContext = new CrawlerContext();
        }
    
        /**
         * Test implementation of UrlFilter for testing
         */
        private static class TestUrlFilter implements UrlFilter {
            @Override
            public void init(String sessionId) {
            }
    
            @Override
            public void addInclude(String urlPattern) {
            }
    
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sat Sep 06 04:15:37 UTC 2025
    - 25.6K bytes
    - Viewed (0)
  6. fess-crawler-lasta/src/main/resources/crawler/filter.xml

    <!DOCTYPE components PUBLIC "-//DBFLUTE//DTD LastaDi 1.0//EN"
    	"http://dbflute.org/meta/lastadi10.dtd">
    <components namespace="fessCrawler">
    	<include path="crawler/container.xml" />
    
    	<component name="urlFilter"
    		class="org.codelibs.fess.crawler.filter.impl.UrlFilterImpl" instance="prototype">
    	</component>
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Oct 11 02:16:55 UTC 2015
    - 364 bytes
    - Viewed (0)
  7. fess-crawler/src/main/java/org/codelibs/fess/crawler/CrawlerThread.java

            final Set<String> urlSet = new HashSet<>();
            final List<UrlQueue<?>> childList = childUrlList.stream()
                    .filter(d -> StringUtil.isNotBlank(d.getUrl()) && urlSet.add(d.getUrl()) && crawlerContext.urlFilter.match(d.getUrl()))
                    .map(d -> {
                        final UrlQueue<?> uq = crawlerContainer.getComponent("urlQueue");
                        uq.setCreateTime(SystemUtil.currentTimeMillis());
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Thu Aug 07 02:55:08 UTC 2025
    - 20.4K bytes
    - Viewed (0)
  8. CLAUDE.md

    ```
    
    3. **Add test with sample file** in `src/test/resources/`
    
    ### Configuring URL Filtering
    
    ```java
    // Include patterns (must match)
    crawler.urlFilter.addInclude("https://example.com/.*");
    
    // Exclude patterns (must not match)
    crawler.urlFilter.addExclude(".*\\.(css|js|png|jpg)$");
    ```
    
    ### Setting Crawl Limits
    
    ```java
    context.setMaxAccessCount(1000);  // Max URLs (0 = unlimited)
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Fri Nov 28 17:31:34 UTC 2025
    - 10.7K bytes
    - Viewed (0)
  9. src/main/java/org/codelibs/fess/indexer/IndexUpdater.java

         */
        private void deleteBySessionId(final String sessionId) {
            try {
                urlFilterService.delete(sessionId);
            } catch (final Exception e) {
                logger.warn("Failed to delete UrlFilter: sessionId={}", sessionId, e);
            }
            try {
                urlQueueService.delete(sessionId);
            } catch (final Exception e) {
    Registered: Sat Dec 20 09:19:18 UTC 2025
    - Last Modified: Fri Nov 28 16:29:12 UTC 2025
    - 32.9K bytes
    - Viewed (0)
Back to top