Can - Code Search

fess-crawler/src/main/java/org/codelibs/fess/crawler/entity/RobotsTxt.java

     * excluding the general "*" pattern which matches all bots.
     *
     * @param userAgent the user agent string to match against directives,
     *                 can be null (treated as empty string)
     * @return the most specific matching directive, or null if no directive matches
     */
    public Directive getMatchedDirective(final String userAgent) {
        final String target;

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 10K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/EmlExtractor.java

        } catch (MessagingException | IOException e) {
            throw new ExtractException(e);
        }
        return buf.toString();
    }

    /**
     * Appends attachment content to the buffer if it can be extracted.
     *
     * @param buf the buffer to append content to
     * @param bodyPart the body part containing the attachment
     */

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 12.6K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/Crawler.java

 *   <li>Execution: Starts the crawler threads and waits for them to complete.</li>
 *   <li>Cleanup: Deletes the crawled data and clears the URL filter.</li>
 * </ol>
 *
 * <p>The crawler can be configured with various parameters, such as the number of threads,
 * the maximum depth of crawling, and URL filters.
 *
 * <p>Example usage:
 * <pre>
 *   Crawler crawler = new Crawler();

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 14K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PdfExtractor.java

import org.codelibs.fess.crawler.helper.MimeTypeHelper;

/**
 * PdfExtractor extracts text content from PDF files using Apache PDFBox.
 * It supports password-protected PDFs and can extract embedded documents and annotations.
 *
 * <p>The extractor runs text extraction in a separate thread with a configurable timeout
 * to prevent hanging on problematic PDF files. It also extracts metadata from the PDF

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 12.7K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/resources/org/codelibs/fess/crawler/mime/tika-mimetypes.xml

     the things they contain, so they don't accidently get detected
     as what's in them
   * For logic too complex to be expressed in a magic match, do the best
     you can here, then provide a Custom Detector for the rest
-->
<mime-info xmlns:tika="https://tika.apache.org/">

  <mime-type type="application/activemessage"/>
  <mime-type type="application/andrew-inset">

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Thu Mar 13 08:18:01 UTC 2025

- 320.1K bytes

- Viewed (1)

github.com/codelibs/fess-crawler

fess-crawler/src/test/java/org/codelibs/fess/crawler/rule/impl/AbstractRuleTest.java

        assertSame(processor2, registeredRule.getResponseProcessor());
    }

    /**
     * Test inheritance behavior
     */
    public void test_inheritanceBehavior() {
        // Test that subclass can override match method
        ConditionalAbstractRule conditionalRule = new ConditionalAbstractRule();
        TestAbstractRule testRule = new TestAbstractRule();

        ResponseData responseData = new ResponseData();

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Wed Sep 03 14:42:53 UTC 2025

- 21.9K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/client/ftp/FtpClient.java

 *
 * <p>
 * The class uses Apache Commons Net library for FTP communication. It maintains a queue of FTPClient
 * instances to improve performance by reusing connections.
 * </p>
 *
 * <p>
 * The client can be configured with FTP-specific settings via init parameters, such as:
 * </p>
 * <ul>
 *   <li>ftpConfigSystemKey: The system key used to configure the FTPClientConfig.</li>

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 39.5K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/transformer/impl/XmlTransformer.java

 * </p>
 *
 * <p>
 * The getData method returns the data extracted from AccessResultData. It can return either a String representation of the XML or a Map/Bean representation based on the configured dataClass.
 * </p>
 *
 * <p>
 * Example Usage:
 * </p>
 *
 * <pre>

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 23.9K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler-opensearch/src/main/java/org/codelibs/fess/crawler/service/impl/AbstractCrawlerService.java

                return null;
            }
            return new Timestamp(DEFAULT_DATE_PRINTER.parseMillis(value));
        }

        /**
         * Determines if this converter can handle the specified class.
         *
         * @param clazz The class to check.
         * @return true if the class is Date.class, false otherwise.
         */
        @Override

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Thu Aug 07 02:55:08 UTC 2025

- 34.2K bytes

- Viewed (0)

Search Options