Search Options

Results per page
Sort
Preferred Languages
Advance

Results 11 - 20 of 130 for extraction (0.04 sec)

  1. fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/EXTRACTOR_TESTS_README.md

    # Extractor Implementation Tests
    
    This directory contains comprehensive tests for the Extractor implementations, focusing on the improvements made to resource management, error handling, and input validation.
    
    ## Test Files Overview
    
    ### 1. ExtractorResourceManagementTest.java
    **Purpose**: Verify proper resource management in Extractor implementations.
    
    **Key Test Areas**:
    - Resource closure on successful extraction (MS Office extractors)
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Wed Nov 19 08:55:01 UTC 2025
    - 5.7K bytes
    - Viewed (0)
  2. README.md

    ## Overview
    
    **Fess Crawler** is a powerful, flexible Java-based web crawling framework designed for enterprise-scale content extraction and processing. Built with a modular architecture, it supports multiple protocols (HTTP/HTTPS, File System, FTP, SMB, Cloud Storage) and provides extensive content extraction capabilities from various document formats.
    
    ### Key Features
    
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Aug 31 05:32:52 UTC 2025
    - 15.3K bytes
    - Viewed (0)
  3. src/test/java/jcifs/smb1/smb1/SmbFileTest.java

                // Test file name extraction
                assertEquals("file.txt", new SmbFile("smb1://server/share/file.txt").getName());
                // Test directory name extraction (should include trailing slash)
                assertEquals("dir/", new SmbFile("smb1://server/share/dir/").getName());
                // Test share name extraction
                assertEquals("share/", new SmbFile("smb1://server/share/").getName());
    Registered: Sat Dec 20 13:44:44 UTC 2025
    - Last Modified: Thu Aug 14 05:31:44 UTC 2025
    - 8.5K bytes
    - Viewed (0)
  4. fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/JsonExtractorTest.java

        public void test_getText_nested() {
            final InputStream in = ResourceUtil.getResourceAsStream("extractor/json/test.json");
            final ExtractData extractData = jsonExtractor.getText(in, null);
            CloseableUtil.closeQuietly(in);
    
            final String content = extractData.getContent();
    
            // Verify nested content extraction
            assertTrue(content.contains("content.summary"));
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Nov 23 03:46:53 UTC 2025
    - 4.7K bytes
    - Viewed (0)
  5. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/CsvExtractor.java

    /**
     * Extracts text content and metadata from CSV files.
     * This extractor provides better structured data extraction compared to Tika's generic text extraction.
     *
     * <p>Features:
     * <ul>
     *   <li>Automatic delimiter detection (comma, tab, semicolon, pipe)</li>
     *   <li>Header row detection and extraction</li>
     *   <li>Column name to data value association</li>
     *   <li>Quoted field handling</li>
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Thu Dec 11 08:38:29 UTC 2025
    - 12.8K bytes
    - Viewed (0)
  6. fess-crawler/src/test/resources/extractor/markdown/test.md

    ---
    title: Sample Markdown Document
    author: John Doe
    date: 2025-01-15
    tags:
      - crawler
      - extractor
      - markdown
    ---
    
    # Introduction
    
    This is a sample Markdown document for testing the MarkdownExtractor.
    
    ## Features
    
    The extractor should handle:
    
    - YAML front matter extraction
    - Heading structure
    - **Bold text** and *italic text*
    - Lists and other formatting
    
    ### Code Examples
    
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Nov 23 03:46:53 UTC 2025
    - 767 bytes
    - Viewed (0)
  7. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/TikaExtractor.java

     *   <li>Handling resource names and content types</li>
     *   <li>Retrying extraction without resource name or content type if the initial attempt fails</li>
     *   <li>Extracting text from metadata if the main content extraction fails</li>
     *   <li>Reading content as plain text if all other methods fail</li>
     *   <li>Applying post-extraction filters</li>
     *   <li>Handling Tika exceptions, including zip bomb exceptions</li>
     * </ul>
     *
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Nov 23 12:19:14 UTC 2025
    - 30.8K bytes
    - Viewed (0)
  8. internal/s3select/jstream/README.md

    #
    
    [![GoDoc](https://godoc.org/github.com/bcicen/jstream?status.svg)](https://godoc.org/github.com/bcicen/jstream)
    
    
    `jstream` is a streaming JSON parser and value extraction library for Go.
    
    Unlike most JSON parsers, `jstream` is document position- and depth-aware -- this enables the extraction of values at a specified depth, eliminating the overhead of allocating encompassing arrays or objects; e.g:
    
    Using the below example document:
    Registered: Sun Dec 28 19:28:13 UTC 2025
    - Last Modified: Mon Sep 23 19:35:41 UTC 2024
    - 3.2K bytes
    - Viewed (0)
  9. fess-crawler/src/main/java/org/codelibs/fess/crawler/exception/UnsupportedExtractException.java

     * governing permissions and limitations under the License.
     */
    package org.codelibs.fess.crawler.exception;
    
    /**
     * UnsupportedExtractException is thrown when the content extraction is not supported.
     * It extends ExtractException and indicates that the requested extraction operation cannot be performed.
     *
     */
    public class UnsupportedExtractException extends ExtractException {
    
        private static final long serialVersionUID = 1L;
    
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 1.2K bytes
    - Viewed (0)
  10. src/main/java/org/codelibs/fess/crawler/transformer/FessXpathTransformer.java

            }
            return URI.create(currentUrl);
        }
    
        /**
         * Gets child URL extraction rules from configuration.
         *
         * @param responseData the response data from crawling
         * @param resultData the result data
         * @return stream of tag-attribute pairs for URL extraction
         */
        @Override
    Registered: Sat Dec 20 09:19:18 UTC 2025
    - Last Modified: Fri Dec 12 13:58:40 UTC 2025
    - 54.6K bytes
    - Viewed (0)
Back to top