Pushes - Code Search

fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/HtmlXpathExtractor.java

import com.google.common.cache.LoadingCache;

import jakarta.annotation.Resource;

/**
 * {@link HtmlXpathExtractor} is an implementation of the {@link org.codelibs.fess.crawler.extractor.Extractor} interface.
 * It uses XPath expressions to extract text content from HTML documents.
 * <p>
 * This class provides methods to configure the XPath expressions, parser features, and properties.

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 10.3K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/client/smb1/SmbClient.java

 * </ul>
 *
 * <p>
 * The client uses a {@link SmbAuthenticationHolder} to manage SMB authentication credentials.
 * It also integrates with other Fess Crawler components, such as {@link ContentLengthHelper} and
 * {@link MimeTypeHelper}, to handle content length checks and MIME type detection.
 * </p>
 *
 * <p>
 * The class uses JCIFS properties to configure the SMB connection.
 * </p>
 *
 * <p>

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Thu Sep 18 09:30:45 UTC 2025

- 23K bytes

- Viewed (0)

github.com/codelibs/fess-suggest

src/main/java/org/codelibs/fess/suggest/index/contents/document/ESSourceReader.java

 * or a fixed number. It also allows filtering documents based on their size, using the {@code limitOfDocumentSize}
 * parameter.
 * </p>
 *
 * <p>
 * The reader uses a queue to buffer documents read from Elasticsearch, and it retries failed requests
 * up to a maximum number of times.
 * </p>
 *
 * <p>
 * <b>Usage:</b>
 * </p>
 * <pre>
 * {@code

Registered: Fri Sep 19 09:08:11 UTC 2025

- Last Modified: Thu Aug 07 02:41:28 UTC 2025

- 11K bytes

- Viewed (0)

github.com/codelibs/fess-suggest

src/main/java/org/codelibs/fess/suggest/converter/KatakanaToAlphabetConverter.java

 *
 * <p>
 * This class implements the {@link ReadingConverter} interface and provides a method to convert a given
 * Katakana string into a list of possible Alphabet readings. It uses a predefined mapping of Katakana
 * characters to their Alphabet equivalents, handling both single and double Katakana character combinations.
 * </p>
 *
 * <p>

Registered: Fri Sep 19 09:08:11 UTC 2025

- Last Modified: Fri Jul 04 14:00:23 UTC 2025

- 10.8K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/processor/impl/DefaultResponseProcessor.java

 * result and storing it. Unsuccessful responses are logged for debugging purposes.
 * </p>
 *
 * <p>
 * The class uses {@link CrawlerContainer} to access components like {@link AccessResult}
 * and {@link UrlQueue}. It also uses {@link CrawlingParameterUtil} to access services
 * like {@link UrlQueueService} and DataService, as well as the {@link CrawlerContext}.
 * </p>
 *
 * <p>

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Thu Aug 07 02:55:08 UTC 2025

- 12.5K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/helper/SitemapsHelper.java

 * and can handle GZIP compressed sitemaps.
 * The class provides methods to check if an input stream is a valid sitemap,
 * and to parse an input stream into a {@link SitemapSet} object.
 * It uses SAX parser for XML sitemaps and XML sitemap indexes,
 * and handles potential exceptions during parsing.
 * The class also includes inner classes for handling XML sitemap and sitemap index parsing.
 */

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 14.7K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/helper/impl/LogHelperImpl.java

import org.codelibs.fess.crawler.log.LogType;

/**
 * Implementation of the {@link LogHelper} interface.
 * This class provides methods for logging various events during the crawling process.
 * It uses Log4j2 for logging.
 *
 * <p>
 * The class contains methods for logging different types of events, such as:
 * </p>
 * <ul>
 *   <li>Starting and finishing threads</li>
 *   <li>Starting and cleaning up crawling</li>

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 14K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/transformer/impl/XmlTransformer.java

import com.google.common.cache.LoadingCache;

import jakarta.annotation.Resource;

/**
 * <p>
 * XmlTransformer is a class that extends AbstractTransformer to transform XML documents into a specific format for indexing.
 * It uses XPath expressions to extract data from the XML and stores it in a ResultData object.
 * </p>
 *
 * <p>
 * This class provides several configuration options to customize the XML parsing process, such as:
 * </p>
 * <ul>

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 23.9K bytes

- Viewed (0)

github.com/codelibs/fess-suggest

src/main/java/org/codelibs/fess/suggest/index/contents/DefaultContentsParser.java

/**
 * DefaultContentsParser is an implementation of the ContentsParser interface.
 * It provides methods to parse search words, query logs, and documents into SuggestItem objects.
 *
 * <p>This class uses various utilities such as ReadingConverter, Normalizer, and SuggestAnalyzer
 * to process and analyze the input data.</p>
 *
 * <p>It also handles the exclusion of search words based on certain criteria and manages the

Registered: Fri Sep 19 09:08:11 UTC 2025

- Last Modified: Fri Jul 04 14:00:23 UTC 2025

- 15.4K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

README.md

// Add file URL
crawler.addUrl("file:///path/to/directory");
crawler.urlFilter.addInclude("file:///path/to/directory/.*");
```

## Configuration

### XML Configuration

Fess Crawler uses XML-based configuration with LastaFlute DI. Place configuration files in your classpath:

```xml
<!-- crawler.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE components PUBLIC "-//DBFLUTE//DTD LastaDi 1.0//EN"

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Aug 31 05:32:52 UTC 2025

- 15.3K bytes

- Viewed (0)

Search Options