- Sort Score
- Result 10 results
- Languages All
Results 21 - 30 of 130 for extraction (2.18 sec)
-
fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/CsvExtractorTest.java
public void test_getText() { final InputStream in = ResourceUtil.getResourceAsStream("extractor/csv/test.csv"); final ExtractData extractData = csvExtractor.getText(in, null); CloseableUtil.closeQuietly(in); final String content = extractData.getContent(); logger.info(content); // Verify header extraction assertTrue(content.contains("Name")); assertTrue(content.contains("Email"));
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Nov 23 03:46:53 UTC 2025 - 5.3K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PasswordBasedExtractor.java
* * <p>The extractor supports two types of password management: * <ul> * <li>Static passwords configured via {@link #addPassword(String, String)}</li> * <li>Dynamic passwords provided through extraction parameters</li> * </ul> * * <p>Passwords are matched against URLs or resource names using regular expression patterns.
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Thu Aug 07 02:55:08 UTC 2025 - 5.1K bytes - Viewed (0) -
CLAUDE.md
**Fess Crawler** is a Java-based web crawling framework for enterprise content extraction. ### Essential Info - **Language**: Java 21+ - **Build**: Maven 3.x - **License**: Apache 2.0 - **DI**: LastaFlute DI - **Repo**: https://github.com/codelibs/fess-crawler ### Tech Stack - **HTTP**: Apache HttpComponents 4.5+ - **Extraction**: Apache Tika 3.0+, POI 5.3+, PDFBox 3.0+ - **Testing**: JUnit 4, UTFlute, Mockito 5.7.0
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Fri Nov 28 17:31:34 UTC 2025 - 10.7K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/TextExtractor.java
} catch (final Exception e) { throw new ExtractException("Failed to extract text content using encoding: " + getEncoding(), e); } } /** * Returns the encoding used for text extraction. * @return the encoding */ public String getEncoding() { return encoding; } /** * Sets the encoding. * @param encoding The encoding to set. */
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Thu Dec 11 08:38:29 UTC 2025 - 2K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/FilenameExtractor.java
* @return An ExtractData object containing the filename as content, or empty string if not found * @throws CrawlerSystemException if the input stream is null * @throws ExtractException if an unexpected error occurs during extraction */ @Override public ExtractData getText(final InputStream in, final Map<String, String> params) { validateInputStream(in); try {Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Wed Nov 19 08:55:01 UTC 2025 - 2.7K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/crawler/transformer/FessFileTransformer.java
throw new FessSystemException("Could not find extractorFactory."); } final Extractor extractor = extractorFactory.getExtractor(responseData.getMimeType()); if (logger.isDebugEnabled()) { logger.debug("url={}, extractor={}", responseData.getUrl(), extractor); } return extractor; }
Registered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Fri Nov 28 16:29:12 UTC 2025 - 3.5K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/exception/ExtractException.java
* governing permissions and limitations under the License. */ package org.codelibs.fess.crawler.exception; /** * Exception thrown during the extraction process in the crawler. * This exception indicates a failure or error that occurred while extracting content from a crawled resource. * It extends {@link org.codelibs.fess.crawler.exception.CrawlerSystemException} and provides constructorsRegistered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sat Mar 15 06:52:00 UTC 2025 - 3K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/transformer/impl/HtmlTransformer.java
this.propertyMap = propertyMap; } /** * Gets the map of child URL extraction rules. * * @return the child URL rule map */ public Map<String, String> getChildUrlRuleMap() { return childUrlRuleMap; } /** * Sets the map of child URL extraction rules. * * @param childUrlRuleMap the child URL rule map to set */Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sat Nov 29 07:42:33 UTC 2025 - 30.5K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/crawler/transformer/FessStandardTransformer.java
} /** * Gets the appropriate extractor for the given response data. * Selects an extractor based on the MIME type or falls back to the Tika extractor. * * @param responseData the response data containing the document to extract * @return the extractor instance for processing the document * @throws FessSystemException if no suitable extractor can be found */ @OverrideRegistered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Fri Nov 28 16:29:12 UTC 2025 - 3.8K bytes - Viewed (0) -
fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/MarkdownExtractorTest.java
final InputStream in = ResourceUtil.getResourceAsStream("extractor/markdown/test.md"); final ExtractData extractData = markdownExtractor.getText(in, null); CloseableUtil.closeQuietly(in); final String content = extractData.getContent(); logger.info(content); // Verify plain text extraction assertTrue(content.contains("Introduction"));
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Mon Nov 24 03:59:47 UTC 2025 - 6.4K bytes - Viewed (0)