Extraction - Code Search

README.md

## Overview

**Fess Crawler** is a powerful, flexible Java-based web crawling framework designed for enterprise-scale content extraction and processing. Built with a modular architecture, it supports multiple protocols (HTTP/HTTPS, File System, FTP, SMB, Cloud Storage) and provides extensive content extraction capabilities from various document formats.

### Key Features

Created: Sun Apr 12 03:50:13 GMT 2026

- Last Modified: Sun Aug 31 05:32:52 GMT 2025

- 15.3K bytes

- Click Count (0)

github.com/codelibs/jcifs

src/test/java/jcifs/smb1/smb1/SmbFileTest.java

            // Test file name extraction
            assertEquals("file.txt", new SmbFile("smb1://server/share/file.txt").getName());
            // Test directory name extraction (should include trailing slash)
            assertEquals("dir/", new SmbFile("smb1://server/share/dir/").getName());
            // Test share name extraction
            assertEquals("share/", new SmbFile("smb1://server/share/").getName());

Created: Sun Apr 05 00:10:12 GMT 2026

- Last Modified: Thu Aug 14 05:31:44 GMT 2025

- 8.5K bytes

- Click Count (0)

github.com/minio/minio

internal/s3select/jstream/README.md

#

[![GoDoc](https://godoc.org/github.com/bcicen/jstream?status.svg)](https://godoc.org/github.com/bcicen/jstream)


`jstream` is a streaming JSON parser and value extraction library for Go.

Unlike most JSON parsers, `jstream` is document position- and depth-aware -- this enables the extraction of values at a specified depth, eliminating the overhead of allocating encompassing arrays or objects; e.g:

Using the below example document:

Created: Sun Apr 05 19:28:12 GMT 2026

- Last Modified: Mon Sep 23 19:35:41 GMT 2024

- 3.2K bytes

- Click Count (0)

github.com/codelibs/fess-crawler

CLAUDE.md

**Fess Crawler** is a Java-based web crawling framework for enterprise content extraction.

### Essential Info

- **Language**: Java 21+
- **Build**: Maven 3.x
- **License**: Apache 2.0
- **DI**: LastaFlute DI
- **Repo**: https://github.com/codelibs/fess-crawler

### Tech Stack

- **HTTP**: Apache HttpComponents 4.5+ and 5.x (switchable)
- **Extraction**: Apache Tika, POI, PDFBox

Created: Sun Apr 12 03:50:13 GMT 2026

- Last Modified: Thu Mar 12 03:39:20 GMT 2026

- 8.1K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/crawler/transformer/FessXpathTransformer.java

        }
        return new URL(currentUrl);
    }

    /**
     * Gets child URL extraction rules from configuration.
     *
     * @param responseData the response data from crawling
     * @param resultData the result data
     * @return stream of tag-attribute pairs for URL extraction
     */
    @Override

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Thu Mar 12 01:46:45 GMT 2026

- 55.3K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/crawler/transformer/FessFileTransformer.java

import jakarta.annotation.PostConstruct;

/**
 * File transformer implementation for the Fess search engine.
 * This transformer handles file-based document transformation and content extraction
 * using the Fess file transformation process with support for various file types.
 *
 * <p>It extends AbstractFessFileTransformer to provide specialized file processing

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Fri Nov 28 16:29:12 GMT 2025

- 3.5K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/crawler/transformer/FessStandardTransformer.java

import org.codelibs.fess.util.ComponentUtil;

import jakarta.annotation.PostConstruct;

/**
 * Standard transformer implementation for the Fess search engine.
 * This transformer handles document transformation and content extraction using
 * the standard Fess file transformation process with support for various content types.
 *
 * <p>It extends AbstractFessFileTransformer to provide file-specific transformation

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Fri Nov 28 16:29:12 GMT 2025

- 3.8K bytes

- Click Count (0)

github.com/gradle/gradle

.teamcity/scripts/CheckWrapper.java

    private static final Pattern ALLOWED_WRAPPER_VERSION =
        Pattern.compile("^[0-9.]+(-(rc|milestone|m)-[0-9]+)?$");

    // Keep the same extraction semantics as the old sed:
    //   sed 's/.*gradle-\(.*\)-[a-z]*\.[a-z]*/\1/'
    private static final Pattern WRAPPER_VERSION_EXTRACT =
        Pattern.compile(".*gradle-(.*)-[a-z]*\\.[a-z]*");

Created: Wed Apr 01 11:36:16 GMT 2026

- Last Modified: Tue Jan 20 03:53:25 GMT 2026

- 6.4K bytes

- Click Count (0)

github.com/apache/maven

impl/maven-cli/src/test/java/org/apache/maven/cling/invoker/mvnup/goals/GAVUtilsTest.java

/**
 * Tests Artifact extraction, computation, and parent resolution functionality.
 */
@DisplayName("GAVUtils")
class GAVUtilsTest {

    @BeforeEach
    void setUp() {}

    private UpgradeContext createMockContext() {
        return TestUtils.createMockContext();
    }

    @Nested
    @DisplayName("Artifact Extraction")
    class GAVExtractionTests {

        @Test

Created: Sun Apr 05 03:35:12 GMT 2026

- Last Modified: Tue Nov 18 18:03:26 GMT 2025

- 17.3K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/helper/ThemeHelper.java

import org.codelibs.fess.helper.PluginHelper.ArtifactType;
import org.codelibs.fess.util.ResourceUtil;

/**
 * Helper class for managing theme installation and uninstallation.
 * Handles the extraction and deployment of theme files from JAR artifacts.
 */
public class ThemeHelper {
    private static final Logger logger = LogManager.getLogger(ThemeHelper.class);

    /**
     * Default constructor for ThemeHelper.

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Fri Nov 28 16:29:12 GMT 2025

- 7.1K bytes

- Click Count (0)

Search Options