Search Options

Results per page
Sort
Preferred Languages
Advance

Results 21 - 30 of 58 for Extraction (0.06 sec)

  1. src/main/java/org/codelibs/fess/crawler/transformer/FessFileTransformer.java

    import jakarta.annotation.PostConstruct;
    
    /**
     * File transformer implementation for the Fess search engine.
     * This transformer handles file-based document transformation and content extraction
     * using the Fess file transformation process with support for various file types.
     *
     * <p>It extends AbstractFessFileTransformer to provide specialized file processing
    Registered: Sat Dec 20 09:19:18 UTC 2025
    - Last Modified: Fri Nov 28 16:29:12 UTC 2025
    - 3.5K bytes
    - Viewed (0)
  2. fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/CsvExtractorTest.java

            logger.info(content);
    
            // Verify header extraction
            assertTrue(content.contains("Name"));
            assertTrue(content.contains("Email"));
            assertTrue(content.contains("Age"));
            assertTrue(content.contains("Department"));
    
            // Verify data extraction
            assertTrue(content.contains("John Doe"));
            assertTrue(content.contains("******@****.***"));
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Nov 23 03:46:53 UTC 2025
    - 5.3K bytes
    - Viewed (0)
  3. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PasswordBasedExtractor.java

     *
     * <p>The extractor supports two types of password management:
     * <ul>
     *   <li>Static passwords configured via {@link #addPassword(String, String)}</li>
     *   <li>Dynamic passwords provided through extraction parameters</li>
     * </ul>
     *
     * <p>Passwords are matched against URLs or resource names using regular expression patterns.
     * The extractor first tries to match against the URL, then falls back to the resource name if available.
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Thu Aug 07 02:55:08 UTC 2025
    - 5.1K bytes
    - Viewed (0)
  4. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/TikaExtractor.java

     *   <li>Handling resource names and content types</li>
     *   <li>Retrying extraction without resource name or content type if the initial attempt fails</li>
     *   <li>Extracting text from metadata if the main content extraction fails</li>
     *   <li>Reading content as plain text if all other methods fail</li>
     *   <li>Applying post-extraction filters</li>
     *   <li>Handling Tika exceptions, including zip bomb exceptions</li>
     * </ul>
     *
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Nov 23 12:19:14 UTC 2025
    - 30.8K bytes
    - Viewed (0)
  5. CLAUDE.md

    **Fess Crawler** is a Java-based web crawling framework for enterprise content extraction.
    
    ### Essential Info
    
    - **Language**: Java 21+
    - **Build**: Maven 3.x
    - **License**: Apache 2.0
    - **DI**: LastaFlute DI
    - **Repo**: https://github.com/codelibs/fess-crawler
    
    ### Tech Stack
    
    - **HTTP**: Apache HttpComponents 4.5+
    - **Extraction**: Apache Tika 3.0+, POI 5.3+, PDFBox 3.0+
    - **Testing**: JUnit 4, UTFlute, Mockito 5.7.0
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Fri Nov 28 17:31:34 UTC 2025
    - 10.7K bytes
    - Viewed (0)
  6. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/FilenameExtractor.java

         * @return An ExtractData object containing the filename as content, or empty string if not found
         * @throws CrawlerSystemException if the input stream is null
         * @throws ExtractException if an unexpected error occurs during extraction
         */
        @Override
        public ExtractData getText(final InputStream in, final Map<String, String> params) {
            validateInputStream(in);
            try {
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Wed Nov 19 08:55:01 UTC 2025
    - 2.7K bytes
    - Viewed (0)
  7. fess-crawler/src/main/java/org/codelibs/fess/crawler/transformer/impl/HtmlTransformer.java

            this.propertyMap = propertyMap;
        }
    
        /**
         * Gets the map of child URL extraction rules.
         *
         * @return the child URL rule map
         */
        public Map<String, String> getChildUrlRuleMap() {
            return childUrlRuleMap;
        }
    
        /**
         * Sets the map of child URL extraction rules.
         *
         * @param childUrlRuleMap the child URL rule map to set
         */
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sat Nov 29 07:42:33 UTC 2025
    - 30.5K bytes
    - Viewed (0)
  8. fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/MarkdownExtractorTest.java

            CloseableUtil.closeQuietly(in);
    
            final String content = extractData.getContent();
            logger.info(content);
    
            // Verify plain text extraction
            assertTrue(content.contains("Introduction"));
            assertTrue(content.contains("This is a sample Markdown document"));
            assertTrue(content.contains("Features"));
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Mon Nov 24 03:59:47 UTC 2025
    - 6.4K bytes
    - Viewed (0)
  9. src/main/java/org/codelibs/fess/helper/ThemeHelper.java

    import org.codelibs.fess.helper.PluginHelper.ArtifactType;
    import org.codelibs.fess.util.ResourceUtil;
    
    /**
     * Helper class for managing theme installation and uninstallation.
     * Handles the extraction and deployment of theme files from JAR artifacts.
     */
    public class ThemeHelper {
        private static final Logger logger = LogManager.getLogger(ThemeHelper.class);
    
        /**
         * Default constructor for ThemeHelper.
    Registered: Sat Dec 20 09:19:18 UTC 2025
    - Last Modified: Fri Nov 28 16:29:12 UTC 2025
    - 7.1K bytes
    - Viewed (0)
  10. src/main/java/org/codelibs/fess/helper/DocumentHelper.java

    /**
     * Helper class for document processing and manipulation in the Fess search system.
     * This class provides utilities for processing document content, titles, and digests,
     * handling text normalization, content extraction, and similar document hash encoding/decoding.
     * It also manages document processing requests and integrates with the crawler system.
     *
     */
    public class DocumentHelper {
    Registered: Sat Dec 20 09:19:18 UTC 2025
    - Last Modified: Fri Nov 28 16:29:12 UTC 2025
    - 17.4K bytes
    - Viewed (0)
Back to top