extraction - Code Search

CLAUDE.md

**Fess Crawler** is a Java-based web crawling framework for enterprise content extraction.

### Essential Info

- **Language**: Java 21+
- **Build**: Maven 3.x
- **License**: Apache 2.0
- **DI**: LastaFlute DI
- **Repo**: https://github.com/codelibs/fess-crawler

### Tech Stack

- **HTTP**: Apache HttpComponents 4.5+ and 5.x (switchable)
- **Extraction**: Apache Tika, POI, PDFBox

Created: Sun Apr 12 03:50:13 GMT 2026

- Last Modified: Thu Mar 12 03:39:20 GMT 2026

- 8.1K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/crawler/transformer/FessXpathTransformer.java

        }
        return new URL(currentUrl);
    }

    /**
     * Gets child URL extraction rules from configuration.
     *
     * @param responseData the response data from crawling
     * @param resultData the result data
     * @return stream of tag-attribute pairs for URL extraction
     */
    @Override

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Thu Mar 12 01:46:45 GMT 2026

- 55.3K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/crawler/transformer/FessStandardTransformer.java

    }

    /**
     * Gets the appropriate extractor for the given response data.
     * Selects an extractor based on the MIME type or falls back to the Tika extractor.
     *
     * @param responseData the response data containing the document to extract
     * @return the extractor instance for processing the document
     * @throws FessSystemException if no suitable extractor can be found
     */
    @Override

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Fri Nov 28 16:29:12 GMT 2025

- 3.8K bytes

- Click Count (0)

github.com/gradle/gradle

.teamcity/scripts/CheckWrapper.java

    private static final Pattern ALLOWED_WRAPPER_VERSION =
        Pattern.compile("^[0-9.]+(-(rc|milestone|m)-[0-9]+)?$");

    // Keep the same extraction semantics as the old sed:
    //   sed 's/.*gradle-\(.*\)-[a-z]*\.[a-z]*/\1/'
    private static final Pattern WRAPPER_VERSION_EXTRACT =
        Pattern.compile(".*gradle-(.*)-[a-z]*\\.[a-z]*");

Created: Wed Apr 01 11:36:16 GMT 2026

- Last Modified: Tue Jan 20 03:53:25 GMT 2026

- 6.4K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/helper/DocumentHelper.java

import org.codelibs.fess.crawler.exception.CrawlerSystemException;
import org.codelibs.fess.crawler.exception.CrawlingAccessException;
import org.codelibs.fess.crawler.extractor.Extractor;
import org.codelibs.fess.crawler.extractor.impl.TikaExtractor;
import org.codelibs.fess.crawler.processor.ResponseProcessor;
import org.codelibs.fess.crawler.processor.impl.DefaultResponseProcessor;
import org.codelibs.fess.crawler.rule.Rule;

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Mon Mar 30 14:27:04 GMT 2026

- 17.4K bytes

- Click Count (0)

github.com/square/okhttp

okhttp/src/commonJvmAndroid/kotlin/okhttp3/internal/platform/Platform.kt

 *
 * Supported on Android 5.0+.
 *
 * Supported on OpenJDK 8 via the JettyALPN-boot library or Conscrypt.
 *
 * Supported on OpenJDK 9+ via SSLParameters and SSLSocket features.
 *
 * ### Trust Manager Extraction
 *
 * Supported on Android 2.3+ and OpenJDK 7+. There are no public APIs to recover the trust
 * manager that was used to create an [SSLSocketFactory].
 *
 * Not supported by choice on JDK9+ due to access checks.
 *

Created: Fri Apr 03 11:42:14 GMT 2026

- Last Modified: Tue Feb 03 22:17:59 GMT 2026

- 8.1K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/crawler/transformer/FessTransformer.java

import org.codelibs.fess.mylasta.direction.FessConfig;
import org.codelibs.fess.util.ComponentUtil;

/**
 * Interface for transforming and processing crawled documents in Fess.
 * Provides utility methods for URL processing, site extraction, data mapping,
 * and field configuration handling during the document transformation process.
 */
public interface FessTransformer {

    /**
     * Synchronized LRU cache for storing parent URL encodings.

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Thu Dec 11 09:47:03 GMT 2025

- 14.1K bytes

- Click Count (0)

github.com/codelibs/fess-suggest

src/main/java/org/codelibs/fess/suggest/util/MapValueExtractor.java

 */
package org.codelibs.fess.suggest.util;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/**
 * Utility class for type-safe value extraction from Map objects.
 * Centralizes map access patterns to reduce code duplication and improve type safety.
 *
 * <p>This class provides methods to safely extract typed values from Map&lt;String, Object&gt;

Created: Fri Apr 17 09:08:13 GMT 2026

- Last Modified: Sun Feb 01 12:48:24 GMT 2026

- 9.8K bytes

- Click Count (0)

github.com/codelibs/fess-suggest

src/main/java/org/codelibs/fess/suggest/util/SuggestUtil.java

        return id;
    }

    /**
     * Parses the given query string and returns an array of keywords.
     *
     * @param q the query string to be parsed
     * @param field the field to be used for keyword extraction
     * @return an array of keywords extracted from the query string, or an empty array if the number of keywords exceeds the maximum allowed or if any keyword exceeds the maximum length
     */

Created: Fri Apr 17 09:08:13 GMT 2026

- Last Modified: Sun Nov 23 11:21:40 GMT 2025

- 17.5K bytes

- Click Count (1)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/crawler/FessCrawlerThread.java

 *
 * <p>Key features include:</p>
 * <ul>
 * <li>Incremental crawling support with last-modified timestamp checking</li>
 * <li>Document expiration handling</li>
 * <li>Child URL extraction and queueing</li>
 * <li>Integration with Fess configuration and permission systems</li>
 * <li>Client selection based on URL patterns</li>
 * </ul>
 *
 * @see CrawlerThread

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Thu Dec 11 09:47:03 GMT 2025

- 19.5K bytes

- Click Count (0)

Search Options