- Sort Score
- Result 10 results
- Languages All
Results 1 - 10 of 28 for Extraction (0.05 sec)
-
fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/FilenameExtractorEnhancedTest.java
} /** * Test extraction with null parameters map. */ public void test_getText_withNullParams() { final InputStream in = new ByteArrayInputStream(new byte[0]); final ExtractData result = filenameExtractor.getText(in, null); assertNotNull(result); assertEquals("", result.getContent()); } /** * Test extraction with empty parameters map. */Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Mon Nov 24 03:59:47 UTC 2025 - 7K bytes - Viewed (0) -
fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/TextExtractorEnhancedTest.java
assertTrue("Error message should indicate extraction failure", e.getMessage().contains("Failed to extract")); } finally { // Reset to default encoding textExtractor.setEncoding("UTF-8"); } } /** * Test extraction with empty input stream. */ public void test_getText_emptyInputStream_returnsEmptyContent() {
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Mon Nov 24 03:59:47 UTC 2025 - 8.9K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/JsonExtractor.java
* This extractor provides better structured data extraction compared to Tika's generic text extraction. * * <p>Features: * <ul> * <li>Structured text extraction with key-value pairs</li> * <li>Top-level field extraction as metadata</li> * <li>Nested structure flattening with configurable depth</li> * <li>Array element extraction</li> * <li>Configurable field separator and array formatting</li> * </ul>
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Nov 23 03:46:53 UTC 2025 - 9.7K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/MarkdownExtractor.java
* This extractor provides better structured data extraction compared to Tika's generic text extraction. * * <p>Features: * <ul> * <li>YAML front matter metadata extraction</li> * <li>Heading structure extraction</li> * <li>Link URL extraction</li> * <li>Code block content extraction</li> * <li>Clean text conversion from Markdown</li> * <li>Configurable encoding</li>
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Nov 23 03:46:53 UTC 2025 - 8.2K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/LhaExtractor.java
* * @param in the input stream containing the LHA archive * @param params extraction parameters * @return the extracted text data * @throws CrawlerSystemException if the input stream is null * @throws ExtractException if an error occurs during extraction * @throws MaxLengthExceededException if the extracted content size exceeds the maximum limit */ @OverrideRegistered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Nov 23 12:19:14 UTC 2025 - 5.9K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/AbstractXmlExtractor.java
/** * Default character encoding for content extraction. */ protected String encoding = Constants.UTF_8; /** * The preload size for charset detection. */ protected int preloadSizeForCharset = 2048; /** * Indicates whether comment tags should be ignored during extraction. */ protected boolean ignoreCommentTag = false; /**Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Nov 23 12:19:14 UTC 2025 - 8.6K bytes - Viewed (0) -
fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/EXTRACTOR_TESTS_README.md
**Key Test Areas**: - Resource closure on successful extraction (MS Office extractors) - Resource closure on failed extraction - Improved error messages with context - Input validation using `validateInputStream()` **Covered Extractors**: - MsWordExtractor - MsExcelExtractor - MsPowerPointExtractor - TextExtractor **Test Count**: 8 tests **Key Scenarios**: - ✅ Successful extraction closes resources properly
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Wed Nov 19 08:55:01 UTC 2025 - 5.7K bytes - Viewed (0) -
src/test/java/jcifs/smb1/smb1/SmbFileTest.java
// Test file name extraction assertEquals("file.txt", new SmbFile("smb1://server/share/file.txt").getName()); // Test directory name extraction (should include trailing slash) assertEquals("dir/", new SmbFile("smb1://server/share/dir/").getName()); // Test share name extraction assertEquals("share/", new SmbFile("smb1://server/share/").getName());Registered: Sat Dec 20 13:44:44 UTC 2025 - Last Modified: Thu Aug 14 05:31:44 UTC 2025 - 8.5K bytes - Viewed (0) -
fess-crawler/src/test/resources/extractor/markdown/test.md
- crawler - extractor - markdown --- # Introduction This is a sample Markdown document for testing the MarkdownExtractor. ## Features The extractor should handle: - YAML front matter extraction - Heading structure - **Bold text** and *italic text* - Lists and other formatting ### Code Examples Here is some inline `code` and a code block: ```java public class Example {Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Nov 23 03:46:53 UTC 2025 - 767 bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/exception/UnsupportedExtractException.java
* governing permissions and limitations under the License. */ package org.codelibs.fess.crawler.exception; /** * UnsupportedExtractException is thrown when the content extraction is not supported. * It extends ExtractException and indicates that the requested extraction operation cannot be performed. * */ public class UnsupportedExtractException extends ExtractException { private static final long serialVersionUID = 1L;Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 1.2K bytes - Viewed (0)