Ticket #67 (closed defect: fixed)

Opened 7 years ago

Last modified 4 years ago

Erroneously detected missing links

Reported by: UdoBorkowski Owned by: MartinBudden
Priority: minor Milestone: 2.4.2
Component: core Version:
Severity: low Keywords:
Cc:

Description

In the current implementation the Tiddler text is checked for WikiWords? and "pretty links" after every change (pressing "done" in the tiddler menu), without looking for any context (e.g. if a word is in a code block, or a comment etc). This makes it very fast to determine the links, but also somehow inexact. A more precise solution requires a more detailed analysis, taking care of the various formatters and marcos and whether text they contain is wikified or not. This is even complicated by the fact that new formatters may be added by plugins: it would be fine if text controlled by these formatter would create the correct "links".

Adding an optional "per formatter" linkifier function could be used to overcome this problem. See this  Discussion for details.

Change History

Changed 7 years ago by UdoBorkowski

  • summary changed from Don't recognize WikiWords as links when in code blocks or comments to Mentioned that URLs may also introduce WikiWord links

It was also reported in this  discussion ) that URLs like this

http:\\domain.com\SomeWord\index.html

make the WikiWord? (here "SomeWord?") appear in the list of orphaned Tiddlers.

So WikiWords? in URLs should also not be considered as links

Changed 7 years ago by JeremyRuston

  • status changed from new to assigned
  • summary changed from Mentioned that URLs may also introduce WikiWord links to Missing WikiWord links can wrongly be detected in URLs
  • milestone set to 2.1

Changed 7 years ago by JeremyRuston

  • status changed from assigned to closed
  • resolution set to fixed

Fixed in changeset:333

Changed 7 years ago by JeremyRuston

  • status changed from closed to reopened
  • resolution fixed deleted
  • summary changed from Missing WikiWord links can wrongly be detected in URLs to Erroneously detected missing links

I've reopened the ticket and broadened the description because there are still several cases that need to be fixed:

  • Ignoring links inside monospaced blocks

Changed 7 years ago by MartinBudden

  • owner changed from JeremyRuston to MartinBudden
  • status changed from reopened to new

Changed 7 years ago by MartinBudden

  • milestone changed from 2.1 to 2.2

Changed 7 years ago by MartinBudden

  • milestone changed from 2.2 to 2.3

Changed 6 years ago by MartinBudden

  • milestone changed from 2.3 to soon

Changed 5 years ago by EricShulman

 http://www.TiddlyTools.com/#CoreTweaks addresses some of the 'wikiwords embedded in non-wiki (aka, "quoted") content' issues:

Specifically, before invoking the tiddler.changed() function, the tweak 'filters' the tiddler content to prevent content contained in certain *non-wikified* blocks from being scanned for WikiWords?:

* comments

/%...%/

* code/pre (monospaced blocks)

...

* unformatted text blocks

"""...""" AND <nowiki>...</nowiki>

* HTML blocks

<html>...</html>

* Inline javascript blocks

<script>...</script>

This eliminates the vast majority (but not ALL) instances of unintended WikiWords? being treated as missing tiddlers.

Changed 5 years ago by EricShulman

  • milestone changed from soon to 2.5

Based on discussion from the "Developer's Conference Call" on Nov 10, 2008, this change should be accepted for the next release. Here's the hijack code from TiddlyTools' CoreTweaks?:

Tiddler.prototype.coreTweaks_changed = Tiddler.prototype.changed;
Tiddler.prototype.changed = function()
{
	var savedtext=this.text;
	// remove 'quoted' text before scanning tiddler source
	this.text=this.text.replace(/\/%((?:.|\n)*?)%\//g,""); // /%...%/
	this.text=this.text.replace(/\{{3}((?:.|\n)*?)\}{3}/g,""); // {{{...}}}
	this.text=this.text.replace(/"{3}((?:.|\n)*?)"{3}/g,""); // """..."""
	this.text=this.text.replace(/\<nowiki\>((?:.|\n)*?)\<\/nowiki\>/g,""); // <nowiki>...</nowiki>
	this.text=this.text.replace(/\<html\>((?:.|\n)*?)\<\/html\>/g,""); // <html>...</html>
	this.text=this.text.replace(/\<script((?:.|\n)*?)\<\/script\>/g,""); // <script>...</script>
	this.coreTweaks_changed.apply(this,arguments);
	// restore quoted text to tiddler source
	this.text=savedtext;
};

(obviously, this would be re-written without the hijack when implemented in the core)

Changed 4 years ago by FND

  • milestone changed from 2.5 to 2.4.2

Changed 4 years ago by EricShulman

To make this even easier to add, here's the complete code for the updated Tiddler.prototype.changed function... basically, it just copies the text from the tiddler into a temp variable, txt and uses regexp filtering to remove the 'quoted text' portions of the source. Then, any references to this.text in the remainder of the function are changed to txt, so that it operates on the filtered text. Note: this is somewhat more efficient than the hijack code, which had to juggle the tiddler's .text property directly in order to affect the change from outside the core code. Given that the changed() function may be invoked for every tiddler in a large document, and can be re-invoked often, every little bit helps :-)

Tiddler.prototype.changed = function()
{
	this.links = [];
	var t = this.autoLinkWikiWords() ? 0 : 1;
	var tiddlerLinkRegExp = t==0 ? config.textPrimitives.tiddlerAnyLinkRegExp : config.textPrimitives.tiddlerForcedLinkRegExp;
	tiddlerLinkRegExp.lastIndex = 0;
	var txt=this.text;
	txt=txt.replace(/\/%((?:.|\n)*?)%\//g,""); // comments
	txt=txt.replace(/\{{3}((?:.|\n)*?)\}{3}/g,""); // pre
	txt=txt.replace(/"{3}((?:.|\n)*?)"{3}/g,""); // nowiki
	txt=txt.replace(/\<nowiki\>((?:.|\n)*?)\<\/nowiki\>/g,""); // nowiki
	txt=txt.replace(/\<html\>((?:.|\n)*?)\<\/html\>/g,""); // html
	txt=txt.replace(/\<script((?:.|\n)*?)\<\/script\>/g,""); // script
	var formatMatch = tiddlerLinkRegExp.exec(txt);
	while(formatMatch) {
		var lastIndex = tiddlerLinkRegExp.lastIndex;
		if(t==0 && formatMatch[1] && formatMatch[1] != this.title) {
			// wikiWordLink
			if(formatMatch.index > 0) {
				var preRegExp = new RegExp(config.textPrimitives.unWikiLink+"|"+config.textPrimitives.anyLetter,"mg");
				preRegExp.lastIndex = formatMatch.index-1;
				var preMatch = preRegExp.exec(txt);
				if(preMatch.index != formatMatch.index-1)
					this.links.pushUnique(formatMatch[1]);
			} else {
				this.links.pushUnique(formatMatch[1]);
			}
		}
		else if(formatMatch[2-t] && !config.formatterHelpers.isExternalLink(formatMatch[3-t])) // titledBrackettedLink
			this.links.pushUnique(formatMatch[3-t]);
		else if(formatMatch[4-t] && formatMatch[4-t] != this.title) // brackettedLink
			this.links.pushUnique(formatMatch[4-t]);
		tiddlerLinkRegExp.lastIndex = lastIndex;
		formatMatch = tiddlerLinkRegExp.exec(txt);
	}
	this.linksUpdated = true;
};

Changed 4 years ago by MartinBudden

  • status changed from new to closed
  • resolution set to fixed

Fixed in changeset:7962

Note: See TracTickets for help on using tickets.