- Sort Score
- Num 10 results
- Language All
Results 1 - 2 of 2 for htmlExtractor (0.35 seconds)
-
CLAUDE.md
- **Extractor**: Weight-based selection (tries in descending weight order) ### Key Extractors `TikaExtractor`, `PdfExtractor`, `MsWordExtractor`, `MsExcelExtractor`, `MsPowerPointExtractor`, `ZipExtractor`, `HtmlExtractor`, `MarkdownExtractor`, `EmlExtractor` ### Helpers - **RobotsTxtHelper**: RFC 9309 parsing, user-agent matching, crawl-delay, sitemaps - **SitemapsHelper**: Sitemap XML parsing, index handling
Created: Sun Apr 12 03:50:13 GMT 2026 - Last Modified: Thu Mar 12 03:39:20 GMT 2026 - 8.1K bytes - Click Count (0) -
fess-crawler-lasta/src/main/resources/crawler/extractor.xml
class="org.codelibs.fess.crawler.extractor.impl.LhaExtractor" /> <component name="textExtractor" class="org.codelibs.fess.crawler.extractor.impl.TextExtractor" /> <component name="htmlExtractor" class="org.codelibs.fess.crawler.extractor.impl.HtmlExtractor"> <property name="featureMap"> <component class="java.util.LinkedHashMap"> <postConstruct name="put"> <arg>"http://xml.org/sax/features/namespaces"</arg>
Created: Sun Apr 12 03:50:13 GMT 2026 - Last Modified: Wed Feb 11 01:15:55 GMT 2026 - 50.4K bytes - Click Count (0)