Search Options

Results per page
Sort
Preferred Languages
Advance

Results 31 - 40 of 65 for urlset (0.02 sec)

  1. fess-crawler/src/main/java/org/codelibs/fess/crawler/service/impl/UrlQueueServiceImpl.java

     * This class provides methods for managing a queue of URLs to be crawled,
     * including adding, deleting, and retrieving URLs from the queue.
     * It uses a {@link MemoryDataHelper} to store the URL queue data in memory.
     *
     * <p>
     * The class is responsible for:
     * </p>
     * <ul>
     *   <li>Updating session IDs for URL queues.</li>
     *   <li>Adding new URLs to the queue.</li>
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 9.3K bytes
    - Viewed (0)
  2. fess-crawler/src/main/java/org/codelibs/fess/crawler/entity/SitemapFile.java

     * This class holds information about a single Sitemap, including its location and last modification timestamp.
     * It implements the {@link Sitemap} interface.
     *
     * <p>
     * A Sitemap file provides search engines with a list of URLs available for crawling.
     * This class encapsulates the essential attributes of a Sitemap entry, allowing for efficient management
     * and processing of Sitemap data.
     * </p>
     *
     * <p>
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 4.4K bytes
    - Viewed (1)
  3. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PasswordBasedExtractor.java

     * <ul>
     *   <li>Static passwords configured via {@link #addPassword(String, String)}</li>
     *   <li>Dynamic passwords provided through extraction parameters</li>
     * </ul>
     *
     * <p>Passwords are matched against URLs or resource names using regular expression patterns.
     * The extractor first tries to match against the URL, then falls back to the resource name if available.
     *
     * @author shinsuke
     */
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Thu Aug 07 02:55:08 UTC 2025
    - 5.1K bytes
    - Viewed (0)
  4. fess-crawler/src/main/java/org/codelibs/fess/crawler/interval/impl/AbstractIntervalController.java

     * <ul>
     *   <li>Before processing a URL ({@link #delayBeforeProcessing()})</li>
     *   <li>After processing a URL ({@link #delayAfterProcessing()})</li>
     *   <li>When there are no URLs in the queue ({@link #delayAtNoUrlInQueue()})</li>
     *   <li>While waiting for new URLs to be added to the queue ({@link #delayForWaitingNewUrl()})</li>
     * </ul>
     *
     * <p>
     * Subclasses are responsible for implementing the abstract methods to define the actual delay
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 4.5K bytes
    - Viewed (0)
  5. fess-crawler/src/main/java/org/codelibs/fess/crawler/helper/UrlConvertHelper.java

    /**
     * Helper class for converting URLs based on a set of predefined rules.
     *
     * <p>This class provides functionality to convert URLs by replacing parts of the URL
     * based on a map of target strings and their corresponding replacements. It allows
     * adding new conversion rules, setting the entire conversion map, and converting
     * URLs using these rules.</p>
     *
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 3.1K bytes
    - Viewed (0)
  6. fess-crawler/src/main/java/org/codelibs/fess/crawler/Crawler.java

    import org.codelibs.fess.crawler.service.UrlQueueService;
    
    import jakarta.annotation.Resource;
    
    /**
     * The Crawler class is the main class for web crawling. It manages the crawling process,
     * including adding URLs to the queue, filtering URLs, managing crawler threads,
     * and handling the overall crawling lifecycle.
     *
     * <p>It implements the Runnable interface to be executed in a separate thread,
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 14K bytes
    - Viewed (0)
  7. src/main/java/org/codelibs/fess/suggest/converter/KatakanaConverter.java

            try (TokenStream stream = createTokenStream(rd)) {
                if (stream == null) {
                    throw new IOException("Invalid tokenizer.");
                }
                stream.reset();
    
                int offset = 0;
                while (stream.incrementToken()) {
                    final CharTermAttribute att = stream.getAttribute(CharTermAttribute.class);
                    final String term = att.toString();
    Registered: Fri Sep 19 09:08:11 UTC 2025
    - Last Modified: Fri Jul 04 14:00:23 UTC 2025
    - 6.1K bytes
    - Viewed (0)
  8. fess-crawler/src/main/java/org/codelibs/fess/crawler/entity/RobotsTxt.java

        public void addSitemap(final String url) {
            if (!sitemapList.contains(url)) {
                sitemapList.add(url);
            }
        }
    
        /**
         * Returns an array of sitemap URLs.
         *
         * @return an array of sitemap URLs
         */
        public String[] getSitemaps() {
            return sitemapList.toArray(new String[sitemapList.size()]);
        }
    
        /**
         * Represents a directive in a robots.txt file.
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 10K bytes
    - Viewed (0)
  9. README.md

    </components>
    ```
    
    ### Crawler Context Configuration
    
    ```java
    // Set maximum number of URLs to crawl
    crawler.crawlerContext.setMaxAccessCount(1000);
    
    // Set number of crawler threads
    crawler.crawlerContext.setNumOfThread(10);
    
    // Set maximum crawl depth
    crawler.crawlerContext.setMaxDepth(3);
    
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Aug 31 05:32:52 UTC 2025
    - 15.3K bytes
    - Viewed (0)
  10. fess-crawler-opensearch/src/main/java/org/codelibs/fess/crawler/service/impl/OpenSearchUrlQueueService.java

            public QueueHolder() {
                // Default constructor
            }
    
            /**
             * The queue for URLs waiting to be crawled.
             */
            protected Queue<OpenSearchUrlQueue> waitingQueue = new ConcurrentLinkedQueue<>();
    
            /**
             * The queue for URLs currently being crawled.
             */
            protected Queue<OpenSearchUrlQueue> crawlingQueue = new ConcurrentLinkedQueue<>();
        }
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Thu Aug 07 02:55:08 UTC 2025
    - 17K bytes
    - Viewed (1)
Back to top