- Sort Score
- Result 10 results
- Languages All
Results 11 - 20 of 130 for extraction (0.04 sec)
-
fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/EXTRACTOR_TESTS_README.md
# Extractor Implementation Tests This directory contains comprehensive tests for the Extractor implementations, focusing on the improvements made to resource management, error handling, and input validation. ## Test Files Overview ### 1. ExtractorResourceManagementTest.java **Purpose**: Verify proper resource management in Extractor implementations. **Key Test Areas**: - Resource closure on successful extraction (MS Office extractors)
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Wed Nov 19 08:55:01 UTC 2025 - 5.7K bytes - Viewed (0) -
README.md
## Overview **Fess Crawler** is a powerful, flexible Java-based web crawling framework designed for enterprise-scale content extraction and processing. Built with a modular architecture, it supports multiple protocols (HTTP/HTTPS, File System, FTP, SMB, Cloud Storage) and provides extensive content extraction capabilities from various document formats. ### Key Features
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Aug 31 05:32:52 UTC 2025 - 15.3K bytes - Viewed (0) -
src/test/java/jcifs/smb1/smb1/SmbFileTest.java
// Test file name extraction assertEquals("file.txt", new SmbFile("smb1://server/share/file.txt").getName()); // Test directory name extraction (should include trailing slash) assertEquals("dir/", new SmbFile("smb1://server/share/dir/").getName()); // Test share name extraction assertEquals("share/", new SmbFile("smb1://server/share/").getName());Registered: Sat Dec 20 13:44:44 UTC 2025 - Last Modified: Thu Aug 14 05:31:44 UTC 2025 - 8.5K bytes - Viewed (0) -
fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/JsonExtractorTest.java
public void test_getText_nested() { final InputStream in = ResourceUtil.getResourceAsStream("extractor/json/test.json"); final ExtractData extractData = jsonExtractor.getText(in, null); CloseableUtil.closeQuietly(in); final String content = extractData.getContent(); // Verify nested content extraction assertTrue(content.contains("content.summary"));Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Nov 23 03:46:53 UTC 2025 - 4.7K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/CsvExtractor.java
/** * Extracts text content and metadata from CSV files. * This extractor provides better structured data extraction compared to Tika's generic text extraction. * * <p>Features: * <ul> * <li>Automatic delimiter detection (comma, tab, semicolon, pipe)</li> * <li>Header row detection and extraction</li> * <li>Column name to data value association</li> * <li>Quoted field handling</li>
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Thu Dec 11 08:38:29 UTC 2025 - 12.8K bytes - Viewed (0) -
fess-crawler/src/test/resources/extractor/markdown/test.md
--- title: Sample Markdown Document author: John Doe date: 2025-01-15 tags: - crawler - extractor - markdown --- # Introduction This is a sample Markdown document for testing the MarkdownExtractor. ## Features The extractor should handle: - YAML front matter extraction - Heading structure - **Bold text** and *italic text* - Lists and other formatting ### Code Examples
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Nov 23 03:46:53 UTC 2025 - 767 bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/TikaExtractor.java
* <li>Handling resource names and content types</li> * <li>Retrying extraction without resource name or content type if the initial attempt fails</li> * <li>Extracting text from metadata if the main content extraction fails</li> * <li>Reading content as plain text if all other methods fail</li> * <li>Applying post-extraction filters</li> * <li>Handling Tika exceptions, including zip bomb exceptions</li> * </ul> *
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Nov 23 12:19:14 UTC 2025 - 30.8K bytes - Viewed (0) -
internal/s3select/jstream/README.md
# [](https://godoc.org/github.com/bcicen/jstream) `jstream` is a streaming JSON parser and value extraction library for Go. Unlike most JSON parsers, `jstream` is document position- and depth-aware -- this enables the extraction of values at a specified depth, eliminating the overhead of allocating encompassing arrays or objects; e.g: Using the below example document:
Registered: Sun Dec 28 19:28:13 UTC 2025 - Last Modified: Mon Sep 23 19:35:41 UTC 2024 - 3.2K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/exception/UnsupportedExtractException.java
* governing permissions and limitations under the License. */ package org.codelibs.fess.crawler.exception; /** * UnsupportedExtractException is thrown when the content extraction is not supported. * It extends ExtractException and indicates that the requested extraction operation cannot be performed. * */ public class UnsupportedExtractException extends ExtractException { private static final long serialVersionUID = 1L;Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 1.2K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/crawler/transformer/FessXpathTransformer.java
} return URI.create(currentUrl); } /** * Gets child URL extraction rules from configuration. * * @param responseData the response data from crawling * @param resultData the result data * @return stream of tag-attribute pairs for URL extraction */ @OverrideRegistered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Fri Dec 12 13:58:40 UTC 2025 - 54.6K bytes - Viewed (0)