Search Options

Display Count
Sort
Preferred Language
Advanced Search

Results 1 - 2 of 2 for htmlExtractor (0.35 seconds)

  1. CLAUDE.md

    - **Extractor**: Weight-based selection (tries in descending weight order)
    
    ### Key Extractors
    
    `TikaExtractor`, `PdfExtractor`, `MsWordExtractor`, `MsExcelExtractor`, `MsPowerPointExtractor`, `ZipExtractor`, `HtmlExtractor`, `MarkdownExtractor`, `EmlExtractor`
    
    ### Helpers
    
    - **RobotsTxtHelper**: RFC 9309 parsing, user-agent matching, crawl-delay, sitemaps
    - **SitemapsHelper**: Sitemap XML parsing, index handling
    Created: Sun Apr 12 03:50:13 GMT 2026
    - Last Modified: Thu Mar 12 03:39:20 GMT 2026
    - 8.1K bytes
    - Click Count (0)
  2. fess-crawler-lasta/src/main/resources/crawler/extractor.xml

    		class="org.codelibs.fess.crawler.extractor.impl.LhaExtractor" />
    	<component name="textExtractor"
    		class="org.codelibs.fess.crawler.extractor.impl.TextExtractor" />
    	<component name="htmlExtractor"
    		class="org.codelibs.fess.crawler.extractor.impl.HtmlExtractor">
    		<property name="featureMap">
    			<component class="java.util.LinkedHashMap">
    				<postConstruct name="put">
    					<arg>"http://xml.org/sax/features/namespaces"</arg>
    Created: Sun Apr 12 03:50:13 GMT 2026
    - Last Modified: Wed Feb 11 01:15:55 GMT 2026
    - 50.4K bytes
    - Click Count (0)
Back to Top