- Sort Score
- Result 10 results
- Languages All
Results 1 - 3 of 3 for HtmlExtractor (0.05 sec)
-
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/HtmlExtractor.java
import org.w3c.dom.Node; import org.xml.sax.InputSource; /** * Extracts text content from HTML documents. */ public class HtmlExtractor extends AbstractXmlExtractor { /** Logger for this class. */ protected static final Logger logger = LogManager.getLogger(HtmlExtractor.class); /** Pattern for extracting charset from meta tags. */Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sat Oct 04 08:47:19 UTC 2025 - 9.3K bytes - Viewed (0) -
CLAUDE.md
### Key Extractors `TikaExtractor` (1000+ formats), `PdfExtractor`, `MsWordExtractor`, `MsExcelExtractor`, `MsPowerPointExtractor`, `ZipExtractor`, `HtmlExtractor`, etc. **Registration**: ```java extractorFactory.addExtractor("text/html", htmlExtractor, 2); // Weight 2 extractorFactory.addExtractor("text/html", tikaExtractor, 1); // Fallback ``` ### Helpers
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Fri Nov 28 17:31:34 UTC 2025 - 10.7K bytes - Viewed (0) -
fess-crawler-lasta/src/main/resources/crawler/extractor.xml
class="org.codelibs.fess.crawler.extractor.impl.LhaExtractor" /> <component name="textExtractor" class="org.codelibs.fess.crawler.extractor.impl.TextExtractor" /> <component name="htmlExtractor" class="org.codelibs.fess.crawler.extractor.impl.HtmlExtractor"> <property name="featureMap"> <component class="java.util.LinkedHashMap"> <postConstruct name="put"> <arg>"http://xml.org/sax/features/namespaces"</arg>
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Nov 23 03:46:53 UTC 2025 - 50.1K bytes - Viewed (0)