- Sort Score
- Result 10 results
- Languages All
Results 11 - 19 of 19 for urlFilter (0.05 sec)
-
fess-crawler-opensearch/src/test/java/org/codelibs/fess/crawler/CrawlerTest.java
crawler.addUrl(url); crawler.getCrawlerContext().setMaxAccessCount(maxCount); crawler.getCrawlerContext().setNumOfThread(numOfThread); crawler.urlFilter.addInclude(url + ".*"); final String sessionId = crawler.execute(); assertEquals(maxCount, dataService.getCount(sessionId)); dataService.delete(sessionId); } finally {
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sat Sep 06 04:15:37 UTC 2025 - 7.7K bytes - Viewed (0) -
README.md
crawler.crawlerContext.setDefaultIntervalTime(1000); // 1 second ``` ### URL Filtering ```java // Include patterns crawler.urlFilter.addInclude("https://example.com/.*"); crawler.urlFilter.addInclude(".*\\.pdf$"); // Exclude patterns crawler.urlFilter.addExclude(".*\\.js$"); crawler.urlFilter.addExclude(".*login.*"); ``` ## Supported Protocols and Formats ### Protocols
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Aug 31 05:32:52 UTC 2025 - 15.3K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/helper/WebFsIndexHelper.java
try { urlFilterService.delete(sid); } catch (final Exception e) { logger.warn("Failed to delete UrlFilter: sessionId={}", sid); } } final DuplicateHostHelper duplicateHostHelper = ComponentUtil.getDuplicateHostHelper(); // set urlsRegistered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Fri Nov 28 16:29:12 UTC 2025 - 25K bytes - Viewed (0) -
fess-crawler/src/test/java/org/codelibs/fess/crawler/CrawlerContextTest.java
protected void setUp() throws Exception { super.setUp(); crawlerContext = new CrawlerContext(); } /** * Test implementation of UrlFilter for testing */ private static class TestUrlFilter implements UrlFilter { @Override public void init(String sessionId) { } @Override public void addInclude(String urlPattern) { }
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sat Sep 06 04:15:37 UTC 2025 - 25.6K bytes - Viewed (0) -
fess-crawler-lasta/src/main/resources/crawler/filter.xml
<!DOCTYPE components PUBLIC "-//DBFLUTE//DTD LastaDi 1.0//EN" "http://dbflute.org/meta/lastadi10.dtd"> <components namespace="fessCrawler"> <include path="crawler/container.xml" /> <component name="urlFilter" class="org.codelibs.fess.crawler.filter.impl.UrlFilterImpl" instance="prototype"> </component>
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Oct 11 02:16:55 UTC 2015 - 364 bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/CrawlerThread.java
final Set<String> urlSet = new HashSet<>(); final List<UrlQueue<?>> childList = childUrlList.stream() .filter(d -> StringUtil.isNotBlank(d.getUrl()) && urlSet.add(d.getUrl()) && crawlerContext.urlFilter.match(d.getUrl())) .map(d -> { final UrlQueue<?> uq = crawlerContainer.getComponent("urlQueue"); uq.setCreateTime(SystemUtil.currentTimeMillis());
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Thu Aug 07 02:55:08 UTC 2025 - 20.4K bytes - Viewed (0) -
CLAUDE.md
``` 3. **Add test with sample file** in `src/test/resources/` ### Configuring URL Filtering ```java // Include patterns (must match) crawler.urlFilter.addInclude("https://example.com/.*"); // Exclude patterns (must not match) crawler.urlFilter.addExclude(".*\\.(css|js|png|jpg)$"); ``` ### Setting Crawl Limits ```java context.setMaxAccessCount(1000); // Max URLs (0 = unlimited)
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Fri Nov 28 17:31:34 UTC 2025 - 10.7K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/indexer/IndexUpdater.java
Registered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Fri Nov 28 16:29:12 UTC 2025 - 32.9K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/opensearch/config/exentity/CrawlingConfig.java
public static final String KEEP_ORIGINAL_BODY = "keep.original.body"; public static final String CLEANUP_ALL = "cleanup.all"; public static final String CLEANUP_URL_FILTERS = "cleanup.urlFilters"; public static final String JCIFS_PREFIX = "jcifs."; public static final String HTML_CANONICAL_XPATH = "html.canonical.xpath";
Registered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Sat Mar 15 06:53:53 UTC 2025 - 5.6K bytes - Viewed (0)