urlset - Code Search

fess-crawler/src/main/java/org/codelibs/fess/crawler/service/impl/UrlQueueServiceImpl.java

 * This class provides methods for managing a queue of URLs to be crawled,
 * including adding, deleting, and retrieving URLs from the queue.
 * It uses a {@link MemoryDataHelper} to store the URL queue data in memory.
 *
 * <p>
 * The class is responsible for:
 * </p>
 * <ul>
 *   <li>Updating session IDs for URL queues.</li>
 *   <li>Adding new URLs to the queue.</li>

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 9.3K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/entity/SitemapFile.java

 * This class holds information about a single Sitemap, including its location and last modification timestamp.
 * It implements the {@link Sitemap} interface.
 *
 * <p>
 * A Sitemap file provides search engines with a list of URLs available for crawling.
 * This class encapsulates the essential attributes of a Sitemap entry, allowing for efficient management
 * and processing of Sitemap data.
 * </p>
 *
 * <p>

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 4.4K bytes

- Viewed (1)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PasswordBasedExtractor.java

 * <ul>
 *   <li>Static passwords configured via {@link #addPassword(String, String)}</li>
 *   <li>Dynamic passwords provided through extraction parameters</li>
 * </ul>
 *
 * <p>Passwords are matched against URLs or resource names using regular expression patterns.
 * The extractor first tries to match against the URL, then falls back to the resource name if available.
 *
 * @author shinsuke
 */

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Thu Aug 07 02:55:08 UTC 2025

- 5.1K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/interval/impl/AbstractIntervalController.java

 * <ul>
 *   <li>Before processing a URL ({@link #delayBeforeProcessing()})</li>
 *   <li>After processing a URL ({@link #delayAfterProcessing()})</li>
 *   <li>When there are no URLs in the queue ({@link #delayAtNoUrlInQueue()})</li>
 *   <li>While waiting for new URLs to be added to the queue ({@link #delayForWaitingNewUrl()})</li>
 * </ul>
 *
 * <p>
 * Subclasses are responsible for implementing the abstract methods to define the actual delay

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 4.5K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/helper/UrlConvertHelper.java

/**
 * Helper class for converting URLs based on a set of predefined rules.
 *
 * <p>This class provides functionality to convert URLs by replacing parts of the URL
 * based on a map of target strings and their corresponding replacements. It allows
 * adding new conversion rules, setting the entire conversion map, and converting
 * URLs using these rules.</p>
 *

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 3.1K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/Crawler.java

import org.codelibs.fess.crawler.service.UrlQueueService;

import jakarta.annotation.Resource;

/**
 * The Crawler class is the main class for web crawling. It manages the crawling process,
 * including adding URLs to the queue, filtering URLs, managing crawler threads,
 * and handling the overall crawling lifecycle.
 *
 * <p>It implements the Runnable interface to be executed in a separate thread,

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 14K bytes

- Viewed (0)

github.com/codelibs/fess-suggest

src/main/java/org/codelibs/fess/suggest/converter/KatakanaConverter.java

        try (TokenStream stream = createTokenStream(rd)) {
            if (stream == null) {
                throw new IOException("Invalid tokenizer.");
            }
            stream.reset();

            int offset = 0;
            while (stream.incrementToken()) {
                final CharTermAttribute att = stream.getAttribute(CharTermAttribute.class);
                final String term = att.toString();

Registered: Fri Sep 19 09:08:11 UTC 2025

- Last Modified: Fri Jul 04 14:00:23 UTC 2025

- 6.1K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/entity/RobotsTxt.java

    public void addSitemap(final String url) {
        if (!sitemapList.contains(url)) {
            sitemapList.add(url);
        }
    }

    /**
     * Returns an array of sitemap URLs.
     *
     * @return an array of sitemap URLs
     */
    public String[] getSitemaps() {
        return sitemapList.toArray(new String[sitemapList.size()]);
    }

    /**
     * Represents a directive in a robots.txt file.

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 10K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

README.md

</components>
```

### Crawler Context Configuration

```java
// Set maximum number of URLs to crawl
crawler.crawlerContext.setMaxAccessCount(1000);

// Set number of crawler threads
crawler.crawlerContext.setNumOfThread(10);

// Set maximum crawl depth
crawler.crawlerContext.setMaxDepth(3);

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Aug 31 05:32:52 UTC 2025

- 15.3K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler-opensearch/src/main/java/org/codelibs/fess/crawler/service/impl/OpenSearchUrlQueueService.java

        public QueueHolder() {
            // Default constructor
        }

        /**
         * The queue for URLs waiting to be crawled.
         */
        protected Queue<OpenSearchUrlQueue> waitingQueue = new ConcurrentLinkedQueue<>();

        /**
         * The queue for URLs currently being crawled.
         */
        protected Queue<OpenSearchUrlQueue> crawlingQueue = new ConcurrentLinkedQueue<>();
    }

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Thu Aug 07 02:55:08 UTC 2025

- 17K bytes

- Viewed (1)

Search Options