Search Options

Results per page
Sort
Preferred Languages
Advance

Results 1 - 10 of 28 for Extraction (0.05 sec)

  1. fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/FilenameExtractorEnhancedTest.java

        }
    
        /**
         * Test extraction with null parameters map.
         */
        public void test_getText_withNullParams() {
            final InputStream in = new ByteArrayInputStream(new byte[0]);
    
            final ExtractData result = filenameExtractor.getText(in, null);
    
            assertNotNull(result);
            assertEquals("", result.getContent());
        }
    
        /**
         * Test extraction with empty parameters map.
         */
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Mon Nov 24 03:59:47 UTC 2025
    - 7K bytes
    - Viewed (0)
  2. fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/TextExtractorEnhancedTest.java

                assertTrue("Error message should indicate extraction failure", e.getMessage().contains("Failed to extract"));
            } finally {
                // Reset to default encoding
                textExtractor.setEncoding("UTF-8");
            }
        }
    
        /**
         * Test extraction with empty input stream.
         */
        public void test_getText_emptyInputStream_returnsEmptyContent() {
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Mon Nov 24 03:59:47 UTC 2025
    - 8.9K bytes
    - Viewed (0)
  3. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/JsonExtractor.java

     * This extractor provides better structured data extraction compared to Tika's generic text extraction.
     *
     * <p>Features:
     * <ul>
     *   <li>Structured text extraction with key-value pairs</li>
     *   <li>Top-level field extraction as metadata</li>
     *   <li>Nested structure flattening with configurable depth</li>
     *   <li>Array element extraction</li>
     *   <li>Configurable field separator and array formatting</li>
     * </ul>
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Nov 23 03:46:53 UTC 2025
    - 9.7K bytes
    - Viewed (0)
  4. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/MarkdownExtractor.java

     * This extractor provides better structured data extraction compared to Tika's generic text extraction.
     *
     * <p>Features:
     * <ul>
     *   <li>YAML front matter metadata extraction</li>
     *   <li>Heading structure extraction</li>
     *   <li>Link URL extraction</li>
     *   <li>Code block content extraction</li>
     *   <li>Clean text conversion from Markdown</li>
     *   <li>Configurable encoding</li>
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Nov 23 03:46:53 UTC 2025
    - 8.2K bytes
    - Viewed (0)
  5. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/LhaExtractor.java

         *
         * @param in the input stream containing the LHA archive
         * @param params extraction parameters
         * @return the extracted text data
         * @throws CrawlerSystemException if the input stream is null
         * @throws ExtractException if an error occurs during extraction
         * @throws MaxLengthExceededException if the extracted content size exceeds the maximum limit
         */
        @Override
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Nov 23 12:19:14 UTC 2025
    - 5.9K bytes
    - Viewed (0)
  6. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/AbstractXmlExtractor.java

        /**
         * Default character encoding for content extraction.
         */
        protected String encoding = Constants.UTF_8;
    
        /**
         * The preload size for charset detection.
         */
        protected int preloadSizeForCharset = 2048;
    
        /**
         * Indicates whether comment tags should be ignored during extraction.
         */
        protected boolean ignoreCommentTag = false;
    
        /**
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Nov 23 12:19:14 UTC 2025
    - 8.6K bytes
    - Viewed (0)
  7. fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/EXTRACTOR_TESTS_README.md

    **Key Test Areas**:
    - Resource closure on successful extraction (MS Office extractors)
    - Resource closure on failed extraction
    - Improved error messages with context
    - Input validation using `validateInputStream()`
    
    **Covered Extractors**:
    - MsWordExtractor
    - MsExcelExtractor
    - MsPowerPointExtractor
    - TextExtractor
    
    **Test Count**: 8 tests
    
    **Key Scenarios**:
    - ✅ Successful extraction closes resources properly
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Wed Nov 19 08:55:01 UTC 2025
    - 5.7K bytes
    - Viewed (0)
  8. src/test/java/jcifs/smb1/smb1/SmbFileTest.java

                // Test file name extraction
                assertEquals("file.txt", new SmbFile("smb1://server/share/file.txt").getName());
                // Test directory name extraction (should include trailing slash)
                assertEquals("dir/", new SmbFile("smb1://server/share/dir/").getName());
                // Test share name extraction
                assertEquals("share/", new SmbFile("smb1://server/share/").getName());
    Registered: Sat Dec 20 13:44:44 UTC 2025
    - Last Modified: Thu Aug 14 05:31:44 UTC 2025
    - 8.5K bytes
    - Viewed (0)
  9. fess-crawler/src/test/resources/extractor/markdown/test.md

      - crawler
      - extractor
      - markdown
    ---
    
    # Introduction
    
    This is a sample Markdown document for testing the MarkdownExtractor.
    
    ## Features
    
    The extractor should handle:
    
    - YAML front matter extraction
    - Heading structure
    - **Bold text** and *italic text*
    - Lists and other formatting
    
    ### Code Examples
    
    Here is some inline `code` and a code block:
    
    ```java
    public class Example {
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Nov 23 03:46:53 UTC 2025
    - 767 bytes
    - Viewed (0)
  10. fess-crawler/src/main/java/org/codelibs/fess/crawler/exception/UnsupportedExtractException.java

     * governing permissions and limitations under the License.
     */
    package org.codelibs.fess.crawler.exception;
    
    /**
     * UnsupportedExtractException is thrown when the content extraction is not supported.
     * It extends ExtractException and indicates that the requested extraction operation cannot be performed.
     *
     */
    public class UnsupportedExtractException extends ExtractException {
    
        private static final long serialVersionUID = 1L;
    
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 1.2K bytes
    - Viewed (0)
Back to top