学海拾贝-让不了解的人明白: 三月 2009

2009年3月22日星期日

项目基金网站

国家自然科学基金网站http://www.nsfc.gov.cn/nsfc2008/index.htm

查询基金网站ISIS http://isis.nsfc.gov.cn/portal/index.asp

2009年3月15日星期日

A Survey on Ontology Mapping
[2006][SIGMOD][Namyoun Choi]
摘要： Ontology日益被看作使得异构系统和SW应用之间交互的关键因素。本文介绍了本体映射的分类，描述了每种分类的特点。比较了这些特点和测量工具、系统和每类映射的相关工作。
一、Introduction：
1、本体的独立开发这一特性导致对相同领域或者重叠领域的本体的定义有所不同，而这些独立的本体开发团体之间缺乏充分沟通，所以：
——需要使用本体映射来促进他们之间的互操作。
2、本文将本体映射分为三大类[相关论文]：
1）综合的、全局本体 <----> 局部本体 ——描述全局本体与局部本体之间的关系
[3]Learning to Match the Schemas of Data Sources: A Multistrategy Approach（2003） [4]Synthesizing an Integrated Ontology（2003） [1]Ontologies for Enterprise Knowledge Management（2003） [7]A Framework for Ontology Integration（2001）
2）局部本体 <----> 局部本体 ——使得在高度动态和分布的环境中的互操作成为可能
[6]C-OWL: Contextualizing Ontologies（2003） [1]Ontologies for Enterprise Knowledge Management（2003） [8]Semantic Coordination: A New Approach and an Application（2003） [9]Learning to Map between Ontologies on the Semantic Web（2003） [12]MAFRA – An Ontology Mapping FRAmework for the Semantic Web（2003） [13]Resolving Terminological Heterogeneity in Ontologies（2002） [14]Representing and reasoning about mappings between domain models（AAAI 2002）
3）mapping for 本体合并与联合 ——被作为本体重用处理的一种途径
[15]PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment(AAAI2000) [16]Ontomorph: A Translation System for Symbolic Knowledge（2000） [17]FCA-Merge:Bottom-Up Merging of Ontologies（2001） [18]Smart:Automated Support for Oontology Merging and Alignment（1999） [19]Rule Induction for Concept Hierarchy Alignment（(IJCAI 2001） [20]Anchor-PROMPT: Using Non-Local Context for Semantic Matching（(IJCAI 2001）
3、本文的内容：
基于详细的评价标准对这三类本体映射的工具和系统进行比较。这些标准是：
a、input requirements —— 需要的输入 b、level of user interaction —— 使用者交互的层次 c、type of output —— 输出的类型 d、content of output —— 输出的内容 e、the following five dimensions structural, lexical, domain, instance-based knowledge, and type of result ——结构的、词汇的、领域的、基于场合的知识、结果类型
4、全文分为四大部分：
Section2：meanings of ontology mapping, ontology integration,merging, and alignment
Section3： ·characteristics and application domains of three different categories of ontology mapping are discussed. ·The tools, systems, frameworks, and related work of ontology mapping are surveyed based on the three different ontology mapping categories. ·Then the overall comparison of tools or systems about ontology mapping is presented
Section4：conclusion and presentation of future work
二、术语：本体映射，本体综合，合并，联合
1、本体映射在很多领域中被用到：ontology integration, merging, and alignment ——这三个领域中所用的的工具都被归为本体映射使用的工具。
还有一个与之相关的领域：schema matching，是数据库中的主要研究内容,本文何种不涉及
[3]Learning to Match the Schemas of Data Sources: A Multistrategy Approach（2003） [36]A comparative analysis of methodologies for database schema integration(1986） [37]Semantic Integration Research in the Database Community: A Brief Survey(2005) [38]Corpus based schema matching(2005)
2、ontology integration, merging, and alignment——都可以被看成本体重用的一种方式
·ontology merging： ——从现存、不同的、并和同一主题相关的两个或者多个本体中生成单个、一致的本体的过程； ——其中被合并的那些本体是有相似性或者部分重叠的
·ontology integration： ——在两个或者多个现存的、不同的、并且描述不同主题的本体基础上生成单独的、在一个领域中的本体 ——原来的多个本体应该是相关的，在他们在被合并后的结果中，原有的本体中的内容会有变化
·ontology alignment： ——建立两个原始本体之间的链接 ——本体联合的源本体是互相一致的、分离的；当需要充足的领域支持时才这么做。
3、ontology mapping（三种情况分别定义） 1）综合的、全局本体 <----> 局部本体
把存在于某一个本体中的一个概念映射到一个视图或者一个在其他本体之上的查询
2）局部本体 <----> 局部本体
在语义关系的基础上，把一个本体中的实体翻译为目标本体中的对应实体的过程。其中的源实体和目标实体在语义概念上相关。
3）mapping for 本体合并与联合
在这种情况下，本体映射在源本体之间建立通信，并决定他们之间重叠的概念、同义词或者对其他源本体来说独特的概念的集合。这种映射鉴别多个源本体之间的相似和不同之处，用于进行合并或者联合。
三、本体映射的种类：
1、综合的、全局本体 <----> 局部本体
1.1 实力和缺点
优点：容易定义映射和找到映射的规则（因为有一个全局本体，全局本体提供共享的词汇）缺点：缺乏可维护性和可测量性，因为局部本体的修改很容易影响到对全局本体的映射
该映射不能在包含不一致信息的相同或相似领域的不同本体之间进行，因为全局本体无法被创建
1.2 应用领域
SW、企业知识管理、信息/数据整合
1.3 工具、系统和相关工作 ·LSD（learning source description）:使用多策略学习方法的半自动创建语义映射。
包含多个Learner：分为两类 base learner 和 meta learner ——base learner：The Name Learner、Content Learner、Na?ve Bayes Learner、XML Learner ——meta learner：meta learner
映射过程分为两个阶段： ——training：a small set of data sources has been manually mapped to the mediated schema and is utilized to train the base learners and the meta learner ——matching：the trained learners predict mappings for new sources and match the schema of the new input source to the mediated schema
·MOMIS (Mediator Environment for Multiple Information Sources): MOMIS creates a global virtual view (GVV) of information sources, independent oftheir location or their data’s heterogeneity.
分为五个阶段：略
·A Framework for OIS(Ontology integration system):
——用描述逻辑来表达本体之间的映射query和本体 ——使用两种方法：从全局本体的每一个概念 --->局部本体的概念(global-centric approach) 从局部本体的每一个概念 --->Global本体的概念(localcentric approach)
2、局部本体 <----> 局部本体（是一个重点）
更适合Web上的高度动态的，开放的，分布式环境下的互操作。
2.1 实力和缺点
优点：适用与各局部本体因为包含不一致的信息而不能被综合或合并时,提供他们之间的互操作。缺点：局部本体之间缺少共同的词汇.
2.2 应用领域
Web,Semantic web
2.3 工具、系统和相关工作
·Context OWL：Contextualizing Ontologies
·CTXMATCH：CTXMATCH is an algorithm for discovering semantic mappings across hierarchical classifications (HCs) using logical deduction.
·GLUE：semi-automatically creates ontology mapping using machine learning techniques
·MAFRA：Ontology MAapping FRAmework for distributed ontologies in the Semantic Web provides a distributed mapping process that consists of five horizontal and four vertical modules.
·LOM：Lexicon-based Ontology Mapping
·QOM：Quick Ontology Mapping，a efficient method for identifying mappings between two ontologies because it has lower run-time complexity.
·ONION：ONtology compositION system
·OKMS：Ontology-based knowledge management system。mapping is used for combining distributed and heterogeneous ontologies
·OMEN：Ontology Mapping Enhancer，OMEN is a probabilistic ontology mapping tool which enhances the quality of existing ontology mappings using a Bayesian Net
·P2P ontology mapping：This work proposes the framework which allows agents to interact with other agents efficiently based on the dynamic mapping of only the portion of ontologies relevant to the interaction.
3、mapping for 本体合并与联合
可以通过合并处理创建一个一致的本体。It also creates links between local ontologies while they remain separate during the ontology alignment process.
映射不存在于合并后得到的的本体与被合并的若干局部本体之间，但存在于被合并的若干局部本体之间。
第一步：找出待合并或者待联合的多个局部本体之间的相似和冲突。
3.1 实力和缺点
3.2 应用领域
Many applications such as standard search, e-commerce, government intelligence, medicine, etc., have large-scale ontologies and require the reuse of ontology merging processes。
3.3 工具、系统和相关工作
·SMAR：SMART is a semi-automatic ontology merging and alignment tool.
·PROMPT：PROMPT is a semi-automatic ontology merging and alignment tool.
·OntoMorph: OntoMorph provides a powerful rule language for specifying mappings, and facilitates ontology merging and the rapid generation of knowledge-base translators.
·HICAL (Hierarchical Concept Alignment system): provides concept hierarchy management for ontology merging/alignment (one concept hierarchy is aligned with another concept in another concept hierarchy), uses a machine-learning method for aligning multiple concept hierarchies, and exploits the data instances in the overlap between the two taxonomies to infer mappings.
·Anchor-PROMPT：takes a set of anchors (pairs of related terms) from the source ontologies and traverses the paths between the anchors in the source ontologies.
·CMS (CROSI Mapping System): CMS is an ontology alignment system.
·FCA-Merge：a method for ontology merging based on Ganter and Wille’s formal concept analysis, lattice exploration, and instances of ontologies to be merged.
·CHIMAERA: CHIMAERA is an interactive ontology merging tool based on the Ontolingual ontology editor.
4、本体映射工具和系统的比较
四、结论
[文中的一个图]
下面要重点看的五篇论文：
A、[2002][www][ AH Doan][Learning to Map Between ontologies on the semantic web]B、[2002][EKAW][Alexander Maedche][MAFRA— A MApping FRAmework for Distributed Ontologies] C、[2003][IEEE Internet Computing][D Beneventano][Synthesizing an integrated ontology](未打)D、[2004][ISWC][John Li][QOM - Quick Ontology Mapping](未打)E、[2003][IEEE Intelligent Systems][A Maedche][Ontologies for enterprise knowledge management](未打)

Latex

常用数学符号的 LaTeX 表示方法
http://www.cfsm.cn/info/symbols/symbols.htm

Similarity Flooding

Similarity Flooding
http://www.blogjava.net/weidagang2046/articles/81825.html
算法大致思路：把要匹配的模型转换为带标记的有向图（directed labeled graphs。由节点和弧组成的图，允许对象用自身的属性及其和其他对象的关系来定义，类似于ER图）。这些图要用来做迭代的不动点计算，计算结果将告诉我们一张图里的哪些节点和第二张图的节点相似。为了计算相似度，我们利用了这样一个直觉：两个不同的节点是相似的，当它们邻接元素是相似的。换句话说，两个元素相似性的一部分传播给了它们各自的邻居，这种传播方式类似于IP广播，这也是SF这个名字的由来。我们把算法的结果叫做一个 mapping，然后根据匹配目标，选择特定的过滤器来过滤出一个原始结果的子集。我们希望能够人工对结果进行修正，需要修正的成员数目就反映了算法的准确性。概述：假设有2个schema，S1和S2。我们要为S1里每一个元素在S2中找到匹配的元素。过程如下： 1. G1 = SQL2Graph(S1); G2 = SQL2Graph(S2); 把schema变成图，图采用了Open Information Model (OIM)规格，图中node采用矩形和卵形，矩形是文字描述，卵形是标识符 2. initialMap = StringMatch(G1, G2); 用字符串匹配做为初始匹配，主要是比较通常的前缀和后缀，这样的结果通常是不准确的 3. product = SFJoin(G1, G2, initialMap); 用SF算法生成结果。假设两个不同的节点是相似的，则它们邻接元素的相似度增加。经过一系列的迭代，这种相似度会传遍整个图 4. result = SelectThreshold(product); 结果筛选SF算法图中的每条边，用一个三元组表示（s，p，o），分别是源点，边名，目的点。相似度传播图：首先定义pairwise connectivity graph(PCG) ： ((x; y); p; (x'; y')) 属于 PCG(A;B)<==>(x; p; x') € A and (y; p; y') € B。关键是p要相同，也就是边的名字一样。式子从右向左推导，就可以A、B从两个模型建立起它们的PCG。图中的每个节点，都是A和B中的元素构成的2元组，叫做map pairs。 induced propagation graph。从PCG推导而来，加上了反向的边，边上注明了[传播系数]，值为 1/n，n为相应的边的数目。不动点计算：设ó(x; y) > 0 代表了节点x € A 和 y € B 的相似度，是在整个 A X B的范围上定义的。我们把 ó 叫做 mapping。相似度的计算就是基于ó-values的迭代计算。设 ói 代表了第 i 次迭代后的结果，ó0 是初始相似度（可以用字符串相似度的办法的得出，在我们的例子里，没有 ó0 ，即让 ó0 =1）。每次迭代中，ó-values 都会根据其邻居paris的 ó-values 乘以[传播系数] 来增加。例如，在第一次迭代 ó1(a1; b1) = ó0(a1; b1) + ó0(a; b) * 0.5 = 1.5。类似的，ó1(a, b) = ó0(a, b) + ó0(a1; b1) * 1.0 + ó0(a2, b1) *1.0 = 3.0。接下来，所有 ó 值进行正规化，比如除以当前迭代的 ó的最大值，保证所有 ó 都不大于1。所以在正规化以后，ó1(a; b) = 1.0, ó1(a1, b1) = 1.5/3.0 = 0.5。一般情况下，迭代如下进行：
上面的计算进行迭代，直到 ón 和 ón-1之间的差别小于一个阈值，如果计算没有聚合，我们就在迭代超过一定次数后停止。上图3的第三副图，就是5次迭代后的结果。表3时一些计算方法，后面的实验表明，C比较好。A叫做 sparce，B叫做 excepted，C叫做verbose过滤迭代出的结果是一种[多匹配]，可能包含有用的匹配子集。三个步骤： 1。用程序定义的[限制条件]进行过滤。 2。用双向图中的匹配上下文技术进行过滤 3。比较各种技术的有效性（满足用户需求的能力）限制：主要有两种，一个是[类型]限制，比如只考虑[列]的匹配（匹配双方都是列）。第二个是 cardinality 限制，即模式S1中的所有元素都要在S2中有一个映射。stable marriage问题：n女和n男配对，不存在这样的两对 (x; y)和(x0; y0)，其中x喜欢 y0 胜过 y，而且 y0 喜欢 x 胜过 x0。具有stable marriage的匹配结果的total satisfaction可能会比不具有stable marriage的匹配结果还低！匹配质量的评估基本的评估思想，就是用户对匹配结果做的修改越少，匹配质量就越高（修改结果包括去掉错误的pair，加上正确的pair） n是找到的匹配数，m是理想的匹配数，c是用户作出修正的数目。

Similarity Flooding全

Similarity Flooding全摘要：在两个data schemas 或者data instances中做元素的匹配在数据仓库、电子商务等领域都很重要。本文中我们提出了一个匹配算法，基于不动点计算，适用于不同场景。算法以两个图作为输入，输出图中对应结点的映射。根据匹配目标，用过滤器选出一个映射的子集。算法运行后，我们期望用人来检查，看是否需要修正结果。事实上，我们根据需要进行修正的数目来评估算法的准确性。我们引入一个例子，使用accuracy metric 来评估用户利用我们的算法来得到一个初始匹配能节省多少时间。最终，我们讨论了如何把算法部署为高级别的运算符，在一个用于管理信息模型和映射的已实现了testbed中。Keywords: Matching, Model Management, Heterogeneous Databases, Semistructured Data1。引言2。方法概述3。SF算法相似度传播图Similarity propagation graph 不动点计算 4。过滤器限制选择的度量指标 Selection metrics5。算法特性的例子半结构化数据 XML模式两种不同的基于图的表示：OEM/Lore、XML/DOM standard。在OEM表示法中，元素tags被当作边标注，DOM表示法把元素间的关系表示为特定的边标注“child”。首先，算法对于不同的表示法产生了相似的结果。其次，例子显示了使用wider spectrum的边标注，会有一个更快的迭代计算。两种表示法虽然图形大小差不多，但是OEM的相似度传播图笔DOM小一半，而且不动点计算的每次迭代都比较快。用实例数据来匹配XML模式查找相关的东西6。匹配质量的评估匹配准确性 Intended match result7。算法和过滤器的评估8。体系结构和实现9。算法的局限 Open Issues and Limitations 1。算法只对于有向labeled图有效。当边名唯一或者无向时候，或者当结点之间的区别模糊的时候会退化。 2。只能匹配同类型的模型 3。一个重要假设就是邻接性对相似度传播的贡献。所以如果无法保存邻接性的信息，则算法无法正常工作。 4。算法会给superstructures.更高的相似度 5。算法未考虑顺序和聚合。如果考虑了，对匹配XML很有帮助 6。算法的独立版本不如为一个特定领域开发的matchers有效10。相关工作11。结论参考文献附录A：内部数据模型设 U为 Unicode alphabet，U*为在U上定义的字符串集合。 entity集合E，statement集合V用如下的递归定义：1. U* × U* 属于 E (任何由2个string组合的二元组都是一个实体，第一个string是entity的type或者namespace，第二个string是entity的名字)2. E×E ×E 属于 V (every tuple of three entities constitutes a statement)3. V 属于 E (every statement is an entity)4. V 和 E 是具有以上属性的最小集合. V的一个子集称为model。以上的定义和终结符基于RDF标准。根据递归定义中的V和E表明，statements可以是嵌套的（一个statement 可以当作另一个statement 中的元素）。在我们的内部数据结构中，嵌套的statement 被用作表示次序关系和聚合。目前，我们文中提到的匹配算法没有用到这些方面。所以，我们不进行嵌套statement 的进一步讨论。我们可以做一个简单的假设：E = U* × U*，V=E3 。所以，一个 model是E3 的子集。在图中，entity就是结点，statement就是边。任何statement (s; p; o)，（中间的元素p叫做predicate）用边上的标注来描述。有共同谓词的声明定义了一个实体间的二元关系)OIM图中，矩形结点被叫做[literals]，属于实体 L = {"literal"}× U*。literals和其他实体没有本质区别，我们在图形上区分，主要为了更好的可读性。模型M1 和 M2 之间的映射可以被从概念上看作一个元组的集合(n1; n2; o)，因此belongsTo(n1;M1); belongsTo(n2;M2) 和o都是实际的数字，代表了相似度。当M1 M2 没有共享元素，映射可以被定义为代权的无向双向图。为了把映射当作模型，模型被表示为一个声明的集合。对于每一个元组t = (n1; n2; o)，我们建立四个声明：1. (node(t); type; MapEntry)2. (node(t); src; n1)3. (node(t); dest; n2)4. (node(t); similarity;o)附录B：算法的通用版本 Generalized version附录C：传播系数 Propagation coefficients附录D：算法的收敛性和复杂度 Convergence and complexity of the algorithm SF的不动点计算可以表达为如下的特征向量计算。T 是一个方形矩阵，和从模型A、B得到的相似度传播图G对应。如果有一条边连接 j = (x; y) 和 i = (x'; y')，传播系数 c, 让矩阵条目 tij = c.其他条目都置0 。注意G中传播系数符合跃迁可能性，如果T是一个跃迁矩阵。当T是一个aperiodic, irreducible matrix (Ergodic theorem)时，不动点计算是收敛的。矩阵 T是 irreducible的，当且仅当 associated graph G 是强连通的(每个结点都可以从任意其他结点达到). 为了保证这些特性，我们可以在G中引入self-loops，通过在不动点方程中包含被加数 o0。例如，让oi+1 = normalize(o0 + p(oi))。这个方法在文学中被称为dampening（使沮丧？）。如果o0 赋了一个非零值给A×B中的每一个map pair, 则加上o0 就相当于把G修改为G'，其中所有结点都通过特定的传播系数互相连接。让 T'成为和 G'联系的矩阵。可以如下表示特征向量计算。设 S 为一个 map pair vector，在每一个位置包含了一个来自o的相似度的值，形成一个map pairs的固定顺序。我们不动点的迭代计算，对应矩阵乘法 T’×S。反复相乘，产生了矩阵T’的占优势的特征向量S*，例如 T’×S* = LS*，其中L是T’的占优势的特征值。在不动点方程中，通过把 T’×S*除以L来进行标准化。不动点计算符合计算T的马尔科夫链。这个事实提供了一个有趣的对算法的深入透视。因为T符合G上的跃迁矩阵，获得的相似度度量标准可以被视为从pair到pair的随机走动导致的map pairs的固定的概率分布。这个随机的走动符合一个人设计师对A和B的手工匹配过程。从一个给定的map pair 开始，设计师基于A和B的结构性特性来推断和另一个map pair的相似度。假设A和B是关系模式的模型。如果设计师得出结论 A中的表t1和B中的表t2匹配，则有一个确定的可能性，他/她下一步就是匹配t1和t2中的列。不动点计算的conversion rate依赖于T的dominant和the second eigenvalue的比率，由G’的结构化特性所决定。较高的dampening values代表了矩阵更快的conversion rate。复杂度：每次迭代中操作的次数正比于传播图G中的边数，和模型A、B边数的乘积成正比。

调整电脑背景颜色保护眼睛

桌面->右键->属性->外观->高级－>项目选择（窗口）、颜色1（L）选择（其它）将色调改为：85。饱和度：123。亮度：205－>添加到自定义颜色－>在自定义颜色选定点确定－>确定这样所有的文档都不再是刺眼的白底黑字，而是非常柔和的豆沙绿色，这个色调是眼科专家配置的，长时间使用会很有效的缓解眼睛疲劳保护眼睛。

2009年3月3日星期二

Deep Web最新研究

Deep Web Research 2009By Marcus P. Zillman, Published on December 28, 2008
Printer-Friendly Version
Bots, Blogs and News Aggregators is a keynote presentation that I have been delivering over the last several years, and much of my information comes from the extensive research that I have completed into the “invisible” or what I like to call the “deep” web. The Deep Web covers somewhere in the vicinity of 1 trillion pages of information located through the World Wide Web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. Search engines find about 20 billion pages at the time of this publication.
In the last several years, some of the more comprehensive search engines have written algorithms to search the deeper portions of the world wide web by attempting to find files such as .pdf, .doc, .xls, ppt, .ps, and others. These files are predominately used by businesses to communicate information within their organization, or to disseminate information to external communities. Searching for this information using deeper search techniques and the latest algorithms allows researchers to obtain a vast amount of corporate information that was previously unavailable or inaccessible. Research has also shown that even deeper information can be obtained from these files by searching and accessing the “properties” information on these files.
This guide is designed to provide a wide range of resources to better understand the history of deep web research. It also includes various classified resources that allow you to search through the currently available web to find key sources of information located via an understanding of how to search the “deep web”.
This Deep Web Research 2009 article is divided into the following sections:
Articles, Papers, Forums, Audios and Videos
Cross Database Articles
Cross Database Search Services
Cross Database Search Tools
Peer to Peer, File Sharing, Grid/Matrix Search Engines
Presentations
Resources - Deep Web Research
Resources - Semantic Web Research
Bot Research Resources and Sites
Subject Tracer Information Blogs
ARTICLES, PAPERS, FORUMS, AUDIOS AND VIDEOS (Current and Historical)
99 Resources to Research & Mine the Invisible Web by Jessica Hupp http://www.collegedegree.com/library/college-life/99-resources-to/
Academic and Scholar Search Engines and Sources http://www.ScholarSearchEngines.com/ All of OCLC’s WorldCat Heading Toward the Open Web by Barbara Quint http://www.infotoday.com/newsbreaks/nb041011-2.shtml
An Interactive Clustering-based Approach to Integrating Source Query interfaces on the Deep Web by W. Wu, C. Yu, A. Doan, W. Meng http://www.cs.binghamton.edu/~meng/pub.d/sigmod04-final.pdf
Annotation for the Deep Web http://csdl.computer.org/comp/mags/ex/2003/05/x5042abs.htm
Automatic Extraction of Web Search Interfaces for Interface Schema Integration by H. He, W. Meng, C. Yu, Z. Wu http://www.cs.binghamton.edu/~meng/pub.d/WWWposterhe.pdf
Automatic Information Extraction From Semi-Structured Web Pages By Pattern Discovery http://portal.acm.org/citation.cfm?id=640423&dl=ACM&amp;coll=portal
Automatic Meaning Discovery Using Google by Rudi Cilibrasi and Paul M. B. Vitanyi http://arxiv.org/abs/cs.CL/0412098 Benevolent "Virus" Helps Reveal the Hidden Web http://www.syllabus.com/article.asp?id=9680
Beyond Google: The Invisible Web - Tools for Teaching the Invisible Web http://www.lagcc.cuny.edu/library/invisibleweb/teachingtools.htm
Bibliomining Bibliography http://www.bibliomining.com/ Bibliomining for Automated Collection Development in a Digital Library Setting: Using Data Mining to Discover Web-Based Scholarly Research Works by Dr. Scott Nicholson http://dlist.sir.arizona.edu/archive/00000625/
Bot Research http://www.BotResearch.info/
Client-Side Deep Web Data Extraction http://doi.ieeecomputersociety.org/10.1109/CEC-EAST.2004.30
Clustering E-Commerce Search Engines by Q. Peng, W. Meng, H. He, C. Yu http://www.cs.binghamton.edu/~meng/pub.d/WWWposterPeng.pdf
Common Information Environment Seeks To Reveal the Hidden Web http://society.guardian.co.uk/e-public/story/0,13927,1195901,00.html
Crawling the Hidden Web by Sriram Raghavan and Hector Garcia-Molina http://citeseer.ist.psu.edu/461253.html
Current Awareness Discovery Tools on the Internet http://zillman.blogspot.com/2004/09/current-awareness-discovery-tools-on.html
Data Extraction and Label Assignment for Web Databases http://www2003.org/cdrom/papers/refereed/p470/p470-wang.htm
Deep Content - Guide To Effective Searching of the Internet http://www.brightplanet.com/deepcontent/tutorials/search/index.asp
Deep Web - Exploring the Secrets of the Hiddden Internet by Marcus P. Zillman, M.S., A.M.H.A., - 23 minutes - Internet/Technology Channel http://www.planetearthradio.com/technology.htm
Deep Web Navigation in Web Data Extraction http://snipurl.com/13xdm
Desperately seeking Web Search 2.0 http://snipurl.com/64im
DigiCULT Thematic Issue 6 Resource Discovery Technologies for the Heritage Sector, June 2004 Download Thematic Issue 6:Link HiRes .pdf (4.9 MB) http://snipurl.com/7v46
Diving in the Deep End of the Web by Suzanne Ross http://research.microsoft.com/displayArticle.aspx?id=1052
Efficient and Effective Metasearch Project http://www.cs.binghamton.edu/~meng/metasearch.html
Google Teams Up with 17 Colleges to Test Searches of Scholarly Materials By Jeffrey R. Young http://chronicle.com/free/2004/04/2004040901n.htm
Graph Structure in the Web http://www9.org/w9cdrom/160/160.html
Grey Literature http://en.wikipedia.org/wiki/Gray_literature
Grey Literature Network Service (GreyNet) http://www.greynet.org/
Gray Literature: Resources for Locating Unpublished Research by Brian S. Mathews http://www.pla.org/ala/mgrps/divs/acrl/publications/crlnews/2004/mar/graylit.cfm
Gray Literature Subject Guide http://www.csulb.edu/library/subj/gray_literature/
Information Retrieval and the Semantic Web by Tim Finin, James Mayfield, Clay Fink, Anupam Joshi, and R. Scott Cost http://ebiquity.umbc.edu/v2.1/paper/html/id/185/
In Search of the Deep Web http://archive.salon.com/tech/feature/2004/03/09/deep_web/index_np.html
Invisible Web Gets Deeper http://www.searchenginewatch.com/sereport/article.php/2162871
Invisible Web Revealed http://www.searchenginewatch.com/sereport/article.php/2167321
IR and IE on the Web - PhD and MSc Dissertations http://www.webir.org/phd.html
JEP: The Deep Web http://hdl.handle.net/2027/spo.3336451.0007.104
LLRX: Book Review: The Invisible Web http://www.llrx.com/features/invisibleweb.htm
LLRX: Deep Web Research http://www.llrx.com/features/deepweb.htm
LLRX: Deep Web Research 2005 http://www.llrx.com/features/deepweb2005.htm
LLRX: Deep Web Research 2006 http://www.llrx.com/features/deepweb2006.htm
LLRX: Deep Web Research 2007 http://www.llrx.com/features/deepweb2007.htm
LLRX: Deep Web Research 2008 http://www.llrx.com/features/deepweb2008.htm
LLRX: Mining Deeper Into the Invisible Web http://www.llrx.com/features/mining.htm
LLRX: ResearchWire: Exposing the Invisible Web http://www.llrx.com/columns/exposing.htm
Metadata? Thesauri? Taxonomies? Topic Maps! by Lars Marius Garshol http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html
Mining Newsgroups Using Networks Arising From Social Behavior http://www.almaden.ibm.com/cs/projects/iis/hdb/Publications/papers/www03_social.pdf
Mining the Deep Web: Search Strategies That Work by Lee Ratzan http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9005757&pageNumber=1
Mining the Deep Web With Specialized Drills http://lists.webjunction.org/wjlists/web4lib/2001-January/034742.html
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews http://www.kushaldave.com/p451-dave.pdf
Mining Topic-Specific Concepts and Definitions on the Web http://www.cs.uic.edu/~liub/publications/WWW-2003.pdf
Modelling and Mining of Network Information Systems Publications http://www.mathstat.dal.ca/~mominis/Publications.htm
Net Plan Builds in Search by Kimberly Patch http://snipurl.com/5kn0
Online or Invisible? http://citeseer.ist.psu.edu/online-nature01/
OntoMiner: Bootstrapping and Populating Ontologies From Domain Specific Web Sites http://www.public.asu.edu/~hdavulcu/VLDB-WS03.pdf
OpenIndex - Creating a Public Internet Index http://www.openindex.org/index.php
Out-googling Google: Federated Searching and the Single Search Box http://library.marist.edu/ACRL/Foxhunt_demo.html
PhysicsWeb: The Physics of the Web http://physicsweb.org/article/world/14/7/09
Publications about Web Analysis, Web Search, Citation Indexing, Digital Libraries, Machine Learning, Neural Networks [Steve Lawrence, Google Labs] http://labs.google.com/people/lawrence/
QProber: Classifying and Searching "Hidden-Web" Text Databases http://qprober.cs.columbia.edu/
Research Beyond Google: 119 Authoritative, Invisible, and Comprehensive Resources http://oedb.org/library/college-basics/research-beyond-google
Researchers Map of the Web http://www.almaden.ibm.com/almaden/webmap_press.html
Scientific American: Featured Article: The Semantic Web http://www.sciam.com/article.cfm?id=the-semantic-web
Search Engine Meeting 2005 Boston, Massachusetts - White Papers and Presentations http://www.infonortics.com/searchengines/sh05/05pro.html
Search Engine Meeting 2006 Boston, Massachusetts - White Papers and Presentations http://www.infonortics.com/searchengines/sh06/06pro.html
Search Engine Meeting 2007 Boston, Massachusetts - White Papers and Presentations http://www.infonortics.com/searchengines/sh07/07pro.html
Search Engine Meeting 2008 Boston, Massachusetts - White Papers and Presentations http://www.infonortics.com/searchengines/sh08/08pro.html
Search Engine Technology and Digital Libraries http://www.dlib.org/dlib/june04/lossau/06lossau.html
Searching the Deep Web by Alex Wright http://mags.acm.org/communications/200810/?pg=16
Searching the Deep Web http://www.dlib.org/dlib/january01/warnick/01warnick.html
Searching the Deep Web - Video http://www.osti.gov/media/DeepWebVideo.html
Searching the Deep Web Online Streaming Tutorial http://www.InformationDetective.com/
Searching the Internet (White Paper, Audio and Video) http://www.SearchingTheInternet.info/
Seeing through the 'invisible' Web http://www.usatoday.com/tech/2001/10/15/invisible-web-search.htm
SemaForm - Semantic Wrapper Generation for Querying Deep Web Data Sources http://www.ucalgary.ca/~jkwalny/502/finalreport.pdf
Semantic Web Content Accessibility Guidelines for Current Research Information Systems (CRIS)by A. Lopatenko http://derpi.tuwien.ac.at/~andrei/AURIS_DE.htm
Smart Search - Advanced Search Engines Link Many Data Sources http://gcn.com/23_24/tech-report/26999-1.html
Structured Databases on the Web: Observations and Implications http://eagle.cs.uiuc.edu/pubs/2004/dwsurvey-sigmodrecord-chlpz-aug04.pdf
Testbed for Information Extraction from Deep Web http://research.microsoft.com/users/nickcr/pubs/yamada_www2004poster.pdf
The Deep Web http://www.internettutorials.net/deepweb.html
The Deep Web: Surfacing Hidden Value by Michael K. Bergman http://hdl.handle.net/2027/spo.3336451.0007.104
The Future Of News: The Digital Information Librarian http://www.masternewmedia.org/2004/03/24/the_future_of_news_the.htm
The Hidden Potential of the Web http://society.guardian.co.uk/e-public/story/0,13927,1195901,00.html
The Invisible Web by Chris Sherman http://www.freepint.com/issues/080600.htm#feature
The Invisible Web: What it is, Why it exists, How to find it, and Its Inherent Ambiguity http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html
The Invisible Web: Where Search Engines Fear To Go http://www.powerhomebiz.com/vol25/invisible.htm
The Mechanics of Deep Net Meta Search http://turbo10.com/papers/deepnet.pdf
The Ultimate Guide to the Invisible Web http://oedb.org/library/college-basics/invisible-web
Timeline of Events Related to the Deep Web http://papergirls.wordpress.com/2008/10/07/timeline-deep-web/
Topological Measures and Maps Of the Web http://informatics.indiana.edu/fil/Web/
Toward the Semantic Deep Web by James Geller, Soon Ae Chun, and Yoo Jung An http://www.computer.org/portal/cms_docs_computer/computer/homepage/Sep08/r9itsys.pdf
Towards Automatic Incorporation of Search Engines Into A Large-Scale Metasearch Engine http://www.cs.binghamton.edu/~meng/pub.d/wi2003.pdf
Traffic-Based Feedback on the Web by Jonathan Aizen, Daniel Huttenlocher, Jon Kleinberg, and Antal Novak http://www.pnas.org/cgi/content/abstract/0307539100v1
Travel Industry and Deep Web: Exclusive Interview with Marcus P. Zillman http://blog.relactions.com/2007/08/travel-industry-and-deep-web-exclusive.html
UMBC - AgentNews http://agents.umbc.edu/agentnews/
Understanding Metadata http://www.niso.org/standards/resources/UnderstandingMetadata.pdf
Using the Internet As a Dynamic Resource Tool for Knowledge Discovery http://zillman.blogspot.com/2004/09/using-internet-as-dynamic-resource.html
Web Characterization Project http://wcp.oclc.org/
Web Data Extractors White Paper Link Compilation http://www.WebDataExtractors.com/
Web Pages Search Engine Based on DNS by Wang Liang, Guo Yi-Ping, and Fang Ming http://arxiv.org/pdf/cs.NI/0403035
WebScales: Towards a Highly Scalable Metasearch Engine http://www.cs.binghamton.edu/~meng/pub.d/PIreport04.html
What Is the Deep Web? A WhatIs Podcast 15 Minute Interview with Marcus P. Zillman http://zillman.blogspot.com/2006/10/what-is-deep-web.html
What is the Invisible Web? A Crawler Perspective by Natalia Arroyo, Laboratorio de Internet http://cybermetrics.wlv.ac.uk/AoIRASIST/arroyo.html
WISE-Cluster: Clustering E-Commerce Search Engines Automatically by Q. Peng, W. Meng, H. He, C. Yu http://www.cs.binghamton.edu/~meng/pub.d/PengWIDM04.pdf
Yahoo and the Deep Web http://news.com.com/2100-1024-5167931.html
CROSS DATABASE ARTICLES

Basic Functional Requirements for Cross Search Service http://www.icbl.hw.ac.uk/perx/basicfunctionalrequirements.htm
Digital Libraries- Cross-Database Search: One-Stop Shopping http://www.libraryjournal.com/article/CA170458.html
Search Tools Reports: Searching for Text Information in Databases http://www.searchtools.com/info/database-search.html
The Right Solution: Federated Search Tools by Roy Tennant http://snipurl.com/5zxp
UK Web Archiving Consortium http://www.webarchive.org.uk/
CROSS DATABASE SEARCH SERVICES
ARC - A Cross Archive Search Service http://arc.cs.odu.edu/
Entrez - The Life Sciences Cross-Database Search Engine http://www.ncbi.nlm.nih.gov/Entrez/index.html
EnergyFiles - Subject Pathways http://energyfiles.osti.gov/
GPO Access - Search Across Multiple Databases http://www.gpoaccess.gov/multidb.html
King County Library System http://www.kcls.org/
NLM Gateway Search http://gateway.nlm.nih.gov/gw/Cmd
SUMSearch http://sumsearch.uthscsa.edu/
Scitopia - Deep Federated Search http://www.scitopia.org/scitopia/
The Metasearch Infrastructure Project http://www.cdlib.org/inside/projects/metasearch/
CROSS DATABASE SEARCH TOOLS
Bright Planet http://brightplanet.com/ Copernic http://www.copernic.com/en/index.html
Cross Database Search Tools Summary http://lists.webjunction.org/wjlists/web4lib/2001-September/027669.html
Dieselpoint Java Search and Navigation Software http://www.dieselpoint.com/
DbVisualizer - The Universal Database Tool http://www.dbvis.com/products/dbvis/
Dublin Core Metadata Initiative (DCMI) http://www.dublincore.org/
EEVL Xtra - Cross Database Search http://www.ariadne.ac.uk/issue44/eevl/
EMC http://software.emc.com/
Gold Rush - Database Search Tool http://goldrush.coalliance.org/
MetaLib http://www.exlibrisgroup.com/metalib.htm
MetaSearch Initiative http://www.niso.org/workrooms/mi
Project - Getting OAI-PMH For Free http://www.modoai.org/
MuseGlobal http://www.museglobal.com/
Peter's PolySearch Engines http://www2.hawaii.edu/~jacso/extra/poly-page.html
PBCore - The Public Broadcasting Metadata Dictionary http://www.utah.edu/cpbmetadata/
Registry of Library Knowledge Bases http://www.public.iastate.edu/~CYBERSTACKS/KBL.htm
Search Federal Research and Development http://fedrnd.osti.gov/
SRU - Search/Retrieve via URL http://www.loc.gov/standards/sru
STINET Multisearch http://multisearch.dtic.mil/
The Flamenco Search Interface Project http://bailando.sims.berkeley.edu/flamenco.html
VIAF: The Virtual International Authority File http://www.oclc.org/research/projects/viaf/default.htm
WebFeat http://www.webfeat.org/
PEER TO PEER (P2P), FILE SHARING, GRID AND MARIX SEARCH ENGINES
ALPINE Network - SourceForge: Project http://sourceforge.net/projects/alpine/
An Efficient Scheme for Query Processing on Peer-to-Peer Networks http://aeolusres.homestead.com/files/index.html angrycoffee.com http://www.AngryCoffee.com/
Azureus - Vuze Java Bittorrent Client http://azureus.sourceforge.net/
BadBlue http://badblue.com/
Between Rhizomes and Trees: P2P Information Systems by Bryn Loban http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1182
Bibster http://bibster.semanticweb.org/index.htm
BigChampagne http://www.bigchampagne.com/
BitTorrent FAQ and Guide http://www.dessent.net/btfaq/
Bit Torrent Official Site and Search Engine http://www.BitTorrent.com/
Bitzi - The Free Universal Media Catalog http://www.bitzi.com/
Blubster http://www.blubster.com/
BotSpot®: File-sharing Bots http://www.botspot.com/BOTSPOT/Windows/Download_Bots/File-sharing_Bots/
BTjunkie - Bittorrent Search Engine http://www.btjunkie.org/
Coral - The Coral P2P Content Distribution Network http://www.coralcdn.org/
Capn's PHP Gnutella Search http://capnbry.net/gnutella/gs.php
Crackle - Stream On http://www.crackle.com/
Current P2P Search Implementations - P2P Networks http://ntrg.cs.tcd.ie/undergrad/4ba2.02-03/p8.html#CurrentP2PSearchImplementations
Deepnet Explorer - P2P/RSS-ATOM Web Browser http://www.deepnetexplorer.com/
Distributed Search Engines http://www.openp2p.com/pub/t/74
Distributed Search in P2P Networks http://csdl.computer.org/comp/mags/ic/2002/01/w1068abs.htm
FAROO - P2P Web Search http://www.faroo.com/ Filetopia http://www.filetopia.org/
Free Haven Project http://www.freehaven.net/index.html
Frost Project - Freenet Messaging and File Sharing Client http://jtcfrost.sourceforge.net/
FuzzBox: Tangent Research Artificial Intelligence and Robotics http://tangentresearch.com/news/07252001_p2p_ai.html
GNUnet - GNU Project - Free Software Foundation (FSF) http://www.gnu.org/software/GNUnet/gnunet.html
GRACE IST Project http://www.grace-ist.org/
GRACE - GRid seArch and Categorization Engine http://www.ub.uni-stuttgart.de/grace/
Grid Resources http://www.GridResources.info/
Grokster3G http://www.grokster3g.com/grub.org
Open Source, Distributed Internet Crawler! http://grub.org/
HyperCuP – Shaping Up Peer-to-Peer Networks http://www-db.stanford.edu/~schloss/hypercup/Ian
Clarke's Blog http://blog.locut.us/
IM and P2P Threat Center http://www.symantec.com/business/security_response/
iMesh http://www.iMesh.com/ International Workshop on Peer-to-Peer Knowledge Management (P2PKM) http://www.p2pkm.org/
Internet Movie Database (IMDb) http://www.imdb.com/iso
Hunt - IRC and Bit Torrent Search Engine http://isohunt.com/
JXTA Project https://jxta.dev.java.net/
Kademlia: A Peer-to-peer Information System Based on the XOR Metric http://citeseer.ist.psu.edu/529075.html
Kazaa Media Desktop http://www.kazaa.com/us/index.htm
LegalTorrents http://www.legaltorrents.com/
Limewire http://www.limewire.com/
LionShare P2P Project - Legitimate File-Sharing Among Individuals and Educational Institutions http://lionshare.its.psu.edu
Lphant - The Full P2P Solution http://www.lphant.com/
MoleSter - A Tiny File-Sharing Application http://ansuz.sooke.bc.ca/software/molester/
Mnet http://mnet.sourceforge.net/
MusicBrainZ http://www.MusicBrainZ.org/
MysterNetworks - The Evolution of Peer-to-Peer http://www.mysternetworks.com/
NeuroGrid - P2P Search http://www.neurogrid.net/ Open Directory - File Sharing http://dmoz.org/Computers/Software/Internet/Clients/File_Sharing/
Open Directory - MP3 Search Engines http://dmoz.org/Arts/Music/Sound_Files/MP3/Search_Engines/
OpenNap: Open Source Napster Server http://opennap.sourceforge.net/
OpenP2P.com http://www.openp2p.com/
Oyster - Managing, Searching and Sharing Ontology Metadata in a Peer-to-Peer Network. http://oyster.ontoware.org/
P2P and the Future of Private Copying by Peter K. Yu, Michigan State University College of Law http://papers.ssrn.com/sol3/papers.cfm?abstract_id=578568
P2PNet - Updated P2P News http://p2pnet.net/index.php
P2P News from Topex http://www.topix.net/tech/p2p
PeerCast P2P Radio http://www.peercast.org/
PeerMind - P2P Monitor http://www.PeerMind.com/
Piolet http://www.piolet.com/ Port Knocking http://www.portknocking.org/
PowerFolder - P2P Whole Folder Synchronization http://www.powerfolder.com/
Rodi - Tiny P2P Client/Host http://larytet.sourceforge.net/btRat.shtml
ScrapeTorrent http://www.ScrapeTorrent.com/ Skype http://www.skype.com/
Slyck - File Sharing News and Info http://www.slyck.com/index.php
Snoopstar http://www.snoopstar.com/
Speckly - Torrent Search Simplified http://speckly.com/
Super-Peer-Based Routing and Clustering Strategies for RDF-Based Peer-to-Peer Networks http://citeseer.ist.psu.edu/nejdl02superpeerbased.html
SwarmStream™ SDK http://onionnetworks.com/products/swarmstream/
The Anthill Project http://www.cs.unibo.it/projects/anthill/
The Pirate Bay - BitTorrent Tracker http://thepiratebay.org/
The Chord Project http://pdos.csail.mit.edu/chord/
The Freenet Project http://freenetproject.org/
The Peer-to-Peer Weblog http://p2p.weblogsinc.com/
The Role of Peer to Peer File Sharing in Law Firm Marketing by Andy Havens http://www.llrx.com/columns/marketing7.htm
ToPeer http://www.topeer.com/
Torrent Finder http://ts.kurtubba.com/
Torrent Reactor http://www.torrentreactor.net/
Torrent Typhoon (TT) http://www.torrenttyphoon.com/
Tranche Project - Secure P2P for the Scientific Community http://tranche.proteomecommons.org/
Tribler - A Social Community That Facilitates Filesharing Through P2P http://www.tribler.org/
TrustyFiles http://www.trustyfiles.com/
Understanding BitTorrent: An Experimental Perspective by Arnaud Legout, Guillaume Urvoy-Keller, and Pietro Michiardi http://hal.inria.fr/inria-00000156/en
URLBlaze: URL Sharing Network http://www.urlblaze.com/
Videora - Personal Video Using P2P and RSS http://www.videora.com/
WASTE http://slackerbitch.free.fr/waste/
WiPeer - Serverless Peer to Peer Collaboration http://www.wipeer.com/
YaCy - Distributed P2P Based Web Indexing and Anonmymous Search Engine http://www.yacy.net/
Yahoo! Directory Peer-to-Peer File Sharing http://dir.yahoo.com/Computers_and_Internet/Internet/Peer_to_Peer_File_Sharing/
YAPPERS: A Peer-to-Peer Lookup Service over Arbitrary Topology http://citeseer.ist.psu.edu/ganesan03yappers.html
YouServ - A P2P (peer-to-peer) Web Hosting/File Sharing System http://www.bayardo.org/youserv/
Zebra http://indexdata.dk/zebra/
PRESENTATIONS
From Theory To Practice - Bielefeld Academic Search Engine http://www.diglib.org/forums/spring2004/presentations/summann-2004-04.pdf
Gumshoe Librarian http://www.llrx.com/features/gumshoe.htm
Quick Introduction to OWL Web Ontology Language http://www.iro.umontreal.ca/~lapalme/ift6281/OWL/CostelloQuickIntroOwl.pdf
Searching the Internet and the Invisible Web http://www.InformationDetective.com/
The Future of the Internet: Bots, Blogs and News Aggregators http://www.zillman.tv/
RESOURCES - Deep Web Research
A Roadmap for Web Mining: From Web to Semantic Web http://eprints.pascal-network.org/archive/00000841/01/roadmap.pdf
Beaucoup http://www.beaucoup.com/
BlogPulse http://www.BlogPulse.com/
Bot Research http://www.BotResearch.info/
BrainBoost - Question Answering Search Engine http://www.BrainBoost.com/
BrightPlanet's Deep Federation Portal™ (DFP) http://www.brightplanet.com/products/dfportal.asp
Can't Find On Google http://www.cantfindongoogle.com
COLLATE - Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive Material http://www.collate.de/
Comet Way http://www.cometway.com/content.agent?page_name=Home
CompletePlanet - 70,000 Databases and Speciality Search Engines http://www.completeplanet.com/
Creative Commons RDF-Enhanced Search http://search.creativecommons.org/
Cuil Search - Search 121,617,892,992 Web Pages http://www.cuil.com/
Cyber Cemetery http://govinfo.library.unt.edu/
CyberFiber http://www.cyberfiber.com Cybermtrics - First Generation Tools - Invisible Web http://www.cindoc.csic.es/cybermetrics/search13.html
Data Fountains: Open Source Internet Resource Discovery and Metadata/Full-Text Generation Service http://infomine.ucr.edu/Data_Fountains/
Data Mining Resources http://www.DataMiningResources.info/
DeepDyve - Deep Web Search Engine http://www.deepdyve.com/
Deep Web Research http://www.DeepWebResearch.info/
Deep Web Technologies http://www.deepwebtech.com/
DigiCULT Resources - Resource Discovery & Information Retrieval http://www.digicult.info/pages/resources.php?t=21 digitalAGORA http://aut.edu/agora/
Directory Resources http://www.DirectoryResources.info/
Direct Search http://www.freepint.com/gary/direct.htm
eFinancial Bot Deep Meta Search Engine http://www.eFinancialBot.com/
eHealthcare Bot Deep Meta Search Engine http://www.eHealthcareBot.com/
eMarketing Bot Deep Meta Search Engine http://www.eMarketingBot.com/
ENDECA http://www.endeca.com/
Engineering Village 2 http://www.engineeringvillage2.org/
Hakia - Search For Meaning http://www.hakia.com/
Find Articles http://www.findarticles.com/PI/index.jhtml
Freely Accessible Databases for the Public http://www.istl.org/01-winter/internet.html
Ghostscript, Ghostview and GSview http://www.cs.wisc.edu/~ghost/
GlobalSpec - Engineering Search Engine http://search.globalspec.com/Search/WebSearch
Google Labs http://labs.google.com/
Google Scholar http://scholar.google.com/
HighWire Press - Largest Repository of Free Full-Text Life Science Articles in the World http://highwire.stanford.edu/
iBoogie™ http://www.iboogie.tv/ IncyWincy - The Invisible Web Search Engine http://www.incywincy.com/
INFOMINE http://infomine.ucr.edu/
Instant Information Systems http://www.docdel.com/
Institutional Archives Registry http://archives.eprints.org/eprints.php?action=browse
Intelligence Center http://www.intelligence-center.com/
Intellisonar™ http://www.quigo.com/intellisonar.htm
Internet Archive http://www.archive.org/
Internet Search Environment Number (ISEN) http://www.isen.org/ Intute http://www.intute.ac.uk/ Invisible Library http://sanchezkisser.com/blog/
Kapow Web Collector http://www.automated-info-solutions.com/
KDnuggets: Data Mining, Web Mining, and Knowledge Discovery Guide http://www.kdnuggets.com/
KeepMedia http://www.keepmedia.com/
Knowledge Discovery http://www.KnowledgeDiscovery.info/
Large-Scale Deep Web Integration: Incomplete Bibliography http://metaquerier.cs.uiuc.edu/webibib.html
Librarians' Index to the Internet http://lii.org/
MagPortal http://www.magportal.com/
Mamma - Deep Web Search Engine http://www.mamma.com/
Mappa.Mundi Magazine http://mappa.mundi.net/
Microsoft Web Search Research and Patents http://www.webmasterworld.com/forum97/5.htm
Mining the Deep Web for Economic Data http://www.citris-uc.org/research/projects/mining_the_deep_web_for_economic_data
Mooter Search http://www.mooter.com/
MSN Sandbox http://sandbox.msn.com/
News Group Search http://newsgroups.langenberg.com/
New Zealand Digital Library http://www.nzdl.org/
OAI-PMH Implementation Guidelines - Conveying rights expressions about metadata in the OAI-PMH framework http://www.openarchives.org/OAI/2.0/guidelines-rights.htm
OAIster http://oaister.umdl.umich.edu/o/oaister/
OneLook Dictionary Search http://www.onelook.com/
Open Archives Initiative http://www.openarchives.org/
OpenIndex - Creating a Public Internet Index http://www.openindex.org/index.php
QProber: Classifying and Searching "Hidden-Web" Text Databases - PERSIVAL Project http://qprober.cs.columbia.edu/
Quigo Technologies http://www.quigo.com/
Powerset - Natural Language Semantic Based Web Search Engine http://www.powerset.com/
Pretrieve Search - Free Public Record Search Engine http://www.pretrieve.com/
Recommended Gateway Sites for the Deep Web http://people.hws.edu/hunter/deepwebgate03.htm
Science Accelerator - Search Key Resources from DOE OSTI http://www.scienceaccelerator.gov/
reSearcher http://researcher.sfu.ca/
Science and Technology Sources on the Internet http://www.library.ucsb.edu/istl/01-winter/internet.html
Scientific and Technical Information Network (STINET) http://stinet.dtic.mil/
Science Commons http://sciencecommons.org/
Science.gov - FirstGov for Science - Government Science Portal http://www.science.gov/
Scirus - Search Engine for Scientific Information http://www.scirus.com/srsapp/
SDARTS - A Protocol and Toolkit for Metasearching http://sdarts.cs.columbia.edu/
Search Adobe PDF Online http://www.SearchPDF.com/
STN International - Databases in Science and Technology http://www.stn-international.de/
Swoogle - Semantic Bot http://swoogle.umbc.edu/
TechDeepWeb - How-To Guide to the Deep Web for IT Professionals http://www.TechDeepWeb.com/
TechXtra - Indepth Academic and Scholar Search http://www.techxtra.ac.uk/
Testbed for Information Extraction from Deep Web http://research.microsoft.com/users/nickcr/pubs/yamada_www2004poster.pdf
The Internet Sleuth http://www.isleuth.com/
The Deep Web http://www.internettutorials.net/deepweb.html
The Invisible Web http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html
THOR: Deep Web Data Extraction http://www.cc.gatech.edu/projects/disl/THOR/
Those Dark Hiding Places: The Invisible Web Revealed http://www.robertlackie.com/invisible/index.html
Turbo10 http://turbo10.com/
UNESCO Information Services - Databases http://www.unesco.org/unesdi/
Wall Street Executive Library http://www.executivelibrary.com/
Web Data Extractors http://www.WebDataExtractors.com/
Web Farming http://webfarming.com/ WebFountain™ http://www.research.ibm.com/journal/sj/431/gruhl.html
Web Intelligence Consortium http://wi-consortium.org/
Web IR & IE http://www.webir.org/ WebScales: Towards a Highly Scalable Metasearch Engine http://www.cs.binghamton.edu/~meng/pub.d/PIreport04.html
Web-Searching Agents http://www.aaai.org/AITopics/html/webagent.html
RESOURCES – Semantic Web Research
AIS SIGSEMIS - SIGSEMIS: Semantic Web and Information Systems http://www.sigsemis.org/
Analyzing Social Networks on the Semantic Web http://snipurl.com/cbdq
Bibster http://bibster.semanticweb.org/index.htm
Combining RDF and OWL with SOAP for Semantic Web http://www.ida.liu.se/~yuxzh/doc/ncws-041002.pdf
DARPA Agent Markup Language http://www.daml.org/
DBin Project - Semantic Web P2P and/or Semantic Newsgroup Client. http://www.dbin.org/
DERI International - Digital Enterprise Research Institute http://www.deri.org/
Digital Object Identifier (DOI) http://www.doi.org/ Fabl - A Native Programming Language for the Semantic Web http://fabl.net/
FOAF Project - A Semantic Web Application http://www.foaf-project.org/
Foundation for Intelligent Physical Agents (FIPA) http://www.fipa.org/
Go3R - Knowledge Based Semantic Search Engine To Avoid Animal Experiments http://www.go3r.org/
hakia - Search for Meaning http://www.hakia.com/
HP Labs Semantic Web Research http://www.hpl.hp.com/semweb/index.html
Infomesh's Semantic Web Introduction http://infomesh.net/2001/swintro/
International Journal of Metadata, Semantics and Ontologies (IJMSO) http://www.inderscience.com/browse/index.php?journalCODE=ijmso
International Journal on Semantic Web and Information Systems (IJSWIS) http://www.ijswis.org/ Jena – A Semantic Web Framework for Java http://jena.sourceforge.net/
Journal of Web Semantics http://snipurl.com/15sdr
Journal of Web Semantics: Preprint Server http://www.websemanticsjournal.org/
Knowledge Discovery http://www.KnowledgeDiscovery.info/
KnowledgeNets http://www.inf.fu-berlin.de/inst/ag-nbi/research/wissensnetze/
Knowledge Search http://www.KnowledgeSearch.org/
Language Engineering for the Semantic Web: A Digital Library for Endangered Languages http://informationr.net/ir/9-3/paper176.html
Magpie - The Samatic Filter and Tool For the Semantic Web http://kmi.open.ac.uk/projects/magpie/main.html
MetaData at W3C http://www.w3.org/Metadata/
Metadata FAQ - Metadata for Education http://www.cetis.ac.uk/metadatafaq/FrontPage
MindRaider - Semantic Web Outliner http://mindraider.sourceforge.net/
MindSwap http://www.MindSwap.org/
MuseoSuomi http://www.museosuomi.fi/
OASIS - Advancing eBusiness Standards http://www.oasis-open.org/home/index.php
OIL - Ontology Inference Layer http://www.ontoknowledge.org/oil/index.shtml
Ontologies for Education (O4E) http://o4e.iiscs.wssu.edu/xwiki/bin/view/Blog/About
Ontology Matching http://www.ontologymatching.org/
Ontology Metadata Vocabulary (OMV) http://omv.ontoware.org/
OntoWare http://ontoware.org/
O'Reilly's Semantic Web Primer http://www.xml.com/pub/a/2000/11/01/semanticweb/
Potential Advantages Of Semantic Web For Internet Commerce by Yuxiao Zhao and Kristian Sandahl http://www.ida.liu.se/~yuxzh/doc/iceis-030120.pdf
Powerset - Natural Language Semantic Based Web Search Engine http://www.powerset.com/
pOWL - Semantic Web Development Plattform http://powl.sourceforge.net/
Practical Semantic Analysis of Web Sites and Documents http://citeseer.ist.psu.edu/despeyroux04practical.html
RDF Context Tools http://www.dbin.org/RDFContextTools.php
RDF - Resource Description Framework http://www.w3.org/RDF/
Rules and Rule Markup Languages for the Semantic Web - RuleML-2003 http://www.informatik.uni-trier.de/~ley/db/conf/semweb/ruleml2003.html
Science and the Semantic Web http://www.mindswap.org/Science/
Semantic Blogging: Spreading the Semantic Web Meme http://jena.hpl.hp.com/~stecay/papers/xmleurope2004/040420_semblog_draft10.html
Semantic Desktop Environment - gnowsis http://www.gnowsis.org/
Semantic Email by Luke McDowell, Oren Etzioni, Alon Halevy, and Henry Levy http://www.cs.usna.edu/~lmcdowel/
Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE) http://simile.mit.edu/
Semantic Knowledge Technologies and Language Computation http://gate.ac.uk/projects/sekt/
Semantic Markup Deconstructed Example http://www.cs.umd.edu/users/hendler/sciam/walkthru.html
Semantic Routing BOF http://www.neurogrid.net/SemanticRouting/SemanticRoutingBOF.htm
Semantic Translator for Enhanced Retrieval by the Bremen University (BUSTER) http://www.informatik.uni-bremen.de/agki/www/buster/new/application.html
SemanticWeb.org - The Semantic Web Community Portal http://www.semanticweb.org/
Semantic Web Activity Statement http://www.w3.org/2001/sw/Activity.html
Semantic Web Application Platform - SWAP http://www.w3.org/2000/10/swap/
Semantic Web Feeds http://semanticwebfeeds.com/
Semantic Web for AURIS-MM http://derpi.tuwien.ac.at/~andrei/AURIS-MM-plan.html
Semantic Web Laboratory http://iit-iti.nrc-cnrc.gc.ca/business-affaire/sem-web-lab_e.html
Semantic Web Primer for Object-Oriented Software Developers http://www.w3.org/TR/2006/NOTE-sw-oosd-primer-20060309/ http://www.w3.org/2001/sw/
Semantic Web Publications http://www.w3.org/2001/sw/#pub
Semantic Web Roadmap http://www.w3.org/DesignIssues/Semantic.html
Semantic Web Services Challenge http://www.sws-challenge.org/
Semantic Web W3C http://www.w3.org/2001/sw/ SemText - Semantic Hypertext - Making Latent Semantics Blatant http://semtext.org/mambo/index.php
SIG SEMIS Semantic Web and Information Systems http://www.sigsemis.org/
SIMAC - Foafing the Music - Semantic Interaction with Music Audio Contents http://foafing-the-music.iua.upf.edu/
SIMILE Project - Semantic Interoperability of Metadata and Information in unLike Environments http://simile.mit.edu/
Sindice - The Semantic Web Index http://sindice.com/
SOAPAgent - An Open SOAP Directory http://soapagent.com/
SourceForge.net: Project Info - OWL API http://sourceforge.net/projects/owlapi
Swoogle - Semantic Bot http://swoogle.umbc.edu/
SWRL: A Semantic Web Rule Language Combining OWL and RuleML http://www.daml.org/2003/11/swrl/
Technology Review: Sir Tim Berners-Lee - The Semantic Web http://www.technologyreview.com/articles/04/10/frauenfelder1004.asp
The Cover Pages http://xml.coverpages.org/
The Memetic Web http://www.memeticweb.org/
The ontoprise® GmbH http://www.ontoprise.de/ The RDF Query Language (RQL) http://139.91.183.30:9090/RDF/RQL/
The Semantic Grid http://www.semanticgrid.org/
The Semantic Social Network by Stephen Downes http://www.downes.ca/cgi-bin/website/view.cgi?dbs=Article&key=1076791198
The Semantic Web: An Introduction http://infomesh.net/2001/swintro/
The Semantic Web By Tim Berners-Lee, James Hendler and Ora Lassila http://snipurl.com/297g
The Semantic Web In Breadth http://logicerror.com/semanticWeb-long
The Semantic Indexing Project - Creating Tools To Identify the Latent Knowledge Found in Text http://www.knowledgesearch.org/
The Semantic Web Is Your Friend http://www.freepint.com/issues/270504.htm#feature
Transforming and Enriching Documents for the Semantic Web by Dietmar Roesner, Manuela Kunze, Sylke Kroetzsch http://arxiv.org/abs/cs.AI/0501096
Twine - A Semantic Web Application That Allows You To Share, Organize, and Find Information http://www.twine.com/
UDDI - Universal Description, Discovery, and Integration http://uddi.xml.org/
Web Semantics: Science, Services and Agents on the World Wide Web http://www.sciencedirect.com/science/journal/15708268
Web Service Modeling Ontology http://www.wsmo.org/
Wilbur Toolkit for Semantic Web Programming http://wilbur-rdf.sourceforge.net/
World Wide Web Reference http://www.WWWReference.info/
XML.com: Semantic Web http://www.xml.com/pub/rg/Semantic_Web
XML.org http://www.xml.org/
Yahoo Groups - SemanticWeb http://groups.yahoo.com/group/semanticweb/
BOT RESEARCH RESOURCES AND SITES
1st Spot http://1st-spot.net/topic_agents.html
Agent Construction Tools http://www.agentbuilder.com/
AgentLand http://www.agentland.com/
AgentLink http://www.AgentLink.org/
Agent Model Yields Leadership http://snipurl.com/99mh
Agent Portal AI http://www.agent.ai/
Agents http://www.aaai.org/AITopics/html/agents.html
AgentSheets - Authoring Tool to Create Agents http://www.agentsheets.com/
Alarm Growing Over Bot Software by Robert Lemos http://news.com.com/2100-7349_3-5202236.html?tag=nefd.lede
ALICEBot http://www.alicebot.org/ Android World http://www.androidworld.com/index.htm
Applied Soft Computing http://www.sciencedirect.com/science/journal/15684946B.4.1
Search Robots - The Robots.txt File http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.4.1
Bookmach - Track Your Favorite Subject Using Sticky Zine and Blog Search http://www.Bookmach.com/
Bot A Blog http://www.BotABlog.com/
Bots, Blogs and News Aggregators http://www.BotsBlogs.com
BotSpot® http://www.botspot.com/
BrowseEngine - Real-Time Meta-Data Search Engine http://www.browseengine.com/
Build a Web Spider on Linux - A Simple Spider and Scraper Collects Internet Content http://snipurl.com/128e6
Cetus Links - Mobile Agents http://www.cetus-links.org/oo_mobile_agents.html
ChatterBots http://www.ChatterBots.info/
Connotate - Intelligent Agent Technology and Competitive Intelligence Tools http://www.connotate.com/intelligent_software_agents.aspx
Data Mining Resources http://www.DataMiningResources.info/
DataparkSearch Engine - Full-Featured Open Source Web-Based Search Engine http://www.dataparksearch.org/
DataStructures http://www.DataStructures.info/
Deep Web Research http://www.deepwebresearch.info/
Design of a Parallel and Distributed Web Search Engine by Salvatore Orlando, Raffaele Perego, and Fabrizio Silvestri http://arxiv.org/abs/cs.IR/0407053
Dictionary of Algorithms and Data Structures http://www.nist.gov/dads/
Eliza - The Original ChatterBot http://www-ai.ijs.si/eliza/eliza.html
FAME (Facilitating Agents in Multiculture Exchange)Project http://cordis.europa.eu/fetch?ACTION=D&CALLER=PROJ_IST&RCN=58337
Fantomas Spider Spy™ The BotBase http://fantomaster.com/fasvsspy01.html
Foundation for Intelligent Physical Agents http://www.fipa.org/
FyberSearch http://www.fybersearch.com/
GeneSys Middleware http://sourceforge.net/projects/genesys-mw/
Google Guide http://www.googleguide.com/
IEI's Graphical Programming Toolbox http://www.imagination-engines.com/gpt.htm
iMacros™ - Browser Based Macro Recorder and Intelligent Agent http://wiki.imacros.net/Main_Page
Imagination Engines http://www.imagination-engines.com/
Indexing Robot Crawler Checklist http://www.searchtools.com/robots/robot-checklist.html
Institute for Human and Machine Cognition (IHMC) http://www.ihmc.us/
Intellexer - Custom Built Search Engines, Knowledge Management Tools, Natural Language Processing http://www.intellexer.com/
International Journal of Agent-Oriented Software Engineering (IJAOSE) http://www.inderscience.com/ijaose
Internet Mathematics http://www.InternetMathematics.org/
KiwiLogic http://www.kiwilogic.com/
Knowledge Discovery http://www.knowledgediscovery.info/
Koders - Source Code Search Engine http://koders.com/
LAIR - Research Projects of the Laboratory of Applied Informatics Research http://lair.indiana.edu/research/
List of User-Agents (Spiders, Robots, Crawler, Browser) http://www.psychedelix.com/agents/index.shtml
Minimal-Intelligence Agents for Bargaining Behaviors in Market-Based Environments by Dave Cliff and Janet Bruten http://www.hpl.hp.com/techreports/97/HPL-97-91.html
MIT Media Lab: Software Agents http://agents.media.mit.edu/index.html
Modelling and Mining of Network Information Systems http://www.mathstat.dal.ca/~mominis/index.html
MultiAgent http://www.MultiAgent.com/
MySpiders http://myspiders.informatics.indiana.edu/
OpenKapow - Serving Mashups For the Long Tail of the Web http://www.openkapow.com/
Open Source Web Information Retrieval (OSWIR05) http://www.emse.fr/OSWIR05/
Oxyus Search Engine http://sourceforge.net/projects/oxyus/
ParsCit Project - Reference String Parsing http://wing.comp.nus.edu.sg/parsCit/
PhpDig.net - Web Spider and Search Engine http://www.phpdig.net/
Robots.Txt Checker - Validator for Robots.txt Files http://tool.motoricerca.info/robots-checker.phtml
RobotsTxt.org http://www.robotstxt.org/
Searchbots - Uniquely Searching the Internet http://www.Searchbots.net/
Search Engine Robots http://www.jafsoft.com/searchengines/webbots.html
Search Engine Watch News http://www.searchenginewatch.com/
Search Tools - Information Guides and News http://www.searchtools.com/
Semantic Indexing and Search http://www.knowledgesearch.org/
Semantic Web http://www.semanticweb.org/
ShoppingBots http://www.ShoppingBots.info/
SiteMaps.org http://www.SiteMaps.org/
Smarter Bots http://www.SmarterBots.com/
SocSciBot3 and SocSciBot 4 http://socscibot.wlv.ac.uk/
Spider Hunter http://www.spiderhunter.com/
Spidering Hacks http://www.oreilly.com/catalog/spiderhks/
Spinn3r: RSS Content, News Feeds, News Content, News Crawler and Web Crawler APIs http://spinn3r.com/
Structure and Interpretation of Computer Programs - Video Lectures by Hal Abelson and Gerald Jay Sussman http://www.swiss.ai.mit.edu/classes/6.001/abelson-sussman-lectures/
Supybot, A Superb Python IRC Bot http://freshmeat.net/projects/supybot/?branch_id=31808&release_id=181322
Swoogle - Semantic Bot http://swoogle.umbc.edu/
The Intelligent Software Agents Lab http://www-2.cs.cmu.edu/~softagents/
The Lemur Toolkit - Language Modeling and Information Retrieval Research http://www.lemurproject.org/
The Search Engine Project (TSEP) http://freshmeat.net/projects/tsep/
The Simon Lavern Page http://www.simonlaven.com/
The Web Robots Pages http://www.robotstxt.org/wc/robots.html
TSEP - The Search Engine Project http://www.tsep.info/
UMBC AgentWeb http://agents.umbc.edu/
UMBC eBiquity http://ebiquity.umbc.edu/
Webbot - the W3C libwww Robot http://www.w3.org/Robot/
Web Curator Tool (WCT) http://webcurator.sourceforge.net/
Web Data Extractors - White Paper Link Compilation http://www.WebDataExtractors.com/
Web Information Retrieval/Natural Language Processing Group (WING) http://wing.comp.nus.edu.sg/portal/
Web Intelligence Consortium http://wi-consortium.org/
Web IR & IE http://www.webir.org/
Words, Extended - Internet Text Information Retrieval, Extraction and Display Bot http://home.earthlink.net/~glenn_scheper/

订阅：评论 (Atom)

学海拾贝-让不了解的人明白