- Sort Score
- Result 10 results
- Languages All
Results 21 - 30 of 58 for Extraction (0.06 sec)
-
src/main/java/org/codelibs/fess/crawler/transformer/FessFileTransformer.java
import jakarta.annotation.PostConstruct; /** * File transformer implementation for the Fess search engine. * This transformer handles file-based document transformation and content extraction * using the Fess file transformation process with support for various file types. * * <p>It extends AbstractFessFileTransformer to provide specialized file processing
Registered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Fri Nov 28 16:29:12 UTC 2025 - 3.5K bytes - Viewed (0) -
fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/CsvExtractorTest.java
logger.info(content); // Verify header extraction assertTrue(content.contains("Name")); assertTrue(content.contains("Email")); assertTrue(content.contains("Age")); assertTrue(content.contains("Department")); // Verify data extraction assertTrue(content.contains("John Doe")); assertTrue(content.contains("******@****.***"));
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Nov 23 03:46:53 UTC 2025 - 5.3K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PasswordBasedExtractor.java
* * <p>The extractor supports two types of password management: * <ul> * <li>Static passwords configured via {@link #addPassword(String, String)}</li> * <li>Dynamic passwords provided through extraction parameters</li> * </ul> * * <p>Passwords are matched against URLs or resource names using regular expression patterns. * The extractor first tries to match against the URL, then falls back to the resource name if available.
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Thu Aug 07 02:55:08 UTC 2025 - 5.1K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/TikaExtractor.java
* <li>Handling resource names and content types</li> * <li>Retrying extraction without resource name or content type if the initial attempt fails</li> * <li>Extracting text from metadata if the main content extraction fails</li> * <li>Reading content as plain text if all other methods fail</li> * <li>Applying post-extraction filters</li> * <li>Handling Tika exceptions, including zip bomb exceptions</li> * </ul> *
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Nov 23 12:19:14 UTC 2025 - 30.8K bytes - Viewed (0) -
CLAUDE.md
**Fess Crawler** is a Java-based web crawling framework for enterprise content extraction. ### Essential Info - **Language**: Java 21+ - **Build**: Maven 3.x - **License**: Apache 2.0 - **DI**: LastaFlute DI - **Repo**: https://github.com/codelibs/fess-crawler ### Tech Stack - **HTTP**: Apache HttpComponents 4.5+ - **Extraction**: Apache Tika 3.0+, POI 5.3+, PDFBox 3.0+ - **Testing**: JUnit 4, UTFlute, Mockito 5.7.0
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Fri Nov 28 17:31:34 UTC 2025 - 10.7K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/FilenameExtractor.java
* @return An ExtractData object containing the filename as content, or empty string if not found * @throws CrawlerSystemException if the input stream is null * @throws ExtractException if an unexpected error occurs during extraction */ @Override public ExtractData getText(final InputStream in, final Map<String, String> params) { validateInputStream(in); try {Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Wed Nov 19 08:55:01 UTC 2025 - 2.7K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/transformer/impl/HtmlTransformer.java
this.propertyMap = propertyMap; } /** * Gets the map of child URL extraction rules. * * @return the child URL rule map */ public Map<String, String> getChildUrlRuleMap() { return childUrlRuleMap; } /** * Sets the map of child URL extraction rules. * * @param childUrlRuleMap the child URL rule map to set */Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sat Nov 29 07:42:33 UTC 2025 - 30.5K bytes - Viewed (0) -
fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/MarkdownExtractorTest.java
CloseableUtil.closeQuietly(in); final String content = extractData.getContent(); logger.info(content); // Verify plain text extraction assertTrue(content.contains("Introduction")); assertTrue(content.contains("This is a sample Markdown document")); assertTrue(content.contains("Features"));
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Mon Nov 24 03:59:47 UTC 2025 - 6.4K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/helper/ThemeHelper.java
import org.codelibs.fess.helper.PluginHelper.ArtifactType; import org.codelibs.fess.util.ResourceUtil; /** * Helper class for managing theme installation and uninstallation. * Handles the extraction and deployment of theme files from JAR artifacts. */ public class ThemeHelper { private static final Logger logger = LogManager.getLogger(ThemeHelper.class); /** * Default constructor for ThemeHelper.Registered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Fri Nov 28 16:29:12 UTC 2025 - 7.1K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/helper/DocumentHelper.java
/** * Helper class for document processing and manipulation in the Fess search system. * This class provides utilities for processing document content, titles, and digests, * handling text normalization, content extraction, and similar document hash encoding/decoding. * It also manages document processing requests and integrates with the crawler system. * */ public class DocumentHelper {Registered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Fri Nov 28 16:29:12 UTC 2025 - 17.4K bytes - Viewed (0)