Ticket #2272 (closed Bug: fixed)

Opened 21 months ago

Last modified 21 months ago

FF3: Paste from word leaves lots of garbage tags

Reported by: joshclark Owned by: martinkou
Priority: Normal Milestone: FCKeditor 2.6.3
Component: UI : Dialogs Version: FCKeditor 2.6.2
Keywords: Confirmed Firefox3 Review+ Cc:

Description

In Firefox 3 RC3, the "paste from word" feature leaves lots of garbage tags behind. Specifically:

* It does not remove comments.
* It does not remove <style> elements.
* It does not remove <meta> elements.
* It does not remove <link> elements.

The comments issue can be fixed by changing this line in fck_paste.html's CleanWord function:

html = html.replace(/<\!--.*?-->/g, '' ) ;

...to this:

html = html.replace(/<\!--[\s\S]*?-->/g, '' ) ;

(Because . does not match new lines, multi-line comments are not removed; [\s\S] does the trick instead.)

To be safe, I recommend making similar changes to all of the fck_paste.html instances where .* is used. Specifically, these lines:

html = html.replace(/<o:p>.*?<\/o:p>/g, '&nbsp;') ;
html = html.replace( /<SPAN\s*>(.*?)<\/SPAN>/gi, '$1' ) ;

html = html.replace( /<FONT\s*>(.*?)<\/FONT>/gi, '$1' ) ;
html = html.replace( /<(\w+)[^>]*\sstyle="[^"]*DISPLAY\s?:\s?none(.*?)<\/\1>/ig, '' ) ;
html = html.replace( /<(H\d)><FONT[^>]*>(.*?)<\/FONT><\/\1>/gi, '<$1>$2<\/$1>' );
html = html.replace( /<(H\d)><EM>(.*?)<\/EM><\/\1>/gi, '<$1>$2<\/$1>' );
var re = new RegExp( '(<P)([^>]*>.*?)(<\/P>)', 'gi' ) ;	// Different because of a IE 5.0 error

...should be changed respectively to:

html = html.replace(/<o:p>[\s\S]*?<\/o:p>/g, '&nbsp;') ;
html = html.replace( /<SPAN\s*>([\s\S]*?)<\/SPAN>/gi, '$1' ) ;

html = html.replace( /<FONT\s*>([\s\S]*?)<\/FONT>/gi, '$1' ) ;
html = html.replace( /<(\w+)[^>]*\sstyle="[^"]*DISPLAY\s?:\s?none([\s\S]*?)<\/\1>/ig, '' ) ;
html = html.replace( /<(H\d)><FONT[^>]*>([\s\S]*?)<\/FONT><\/\1>/gi, '<$1>$2<\/$1>' );
html = html.replace( /<(H\d)><EM>([\s\S]*?)<\/EM><\/\1>/gi, '<$1>$2<\/$1>' );
var re = new RegExp( '(<P)([^>]*>[\s\S]*?)(<\/P>)', 'gi' ) ;	// Different because of a IE 5.0 error

Also, to get rid of the <meta>, <link> and <style> elements, I suggest adding these additional replacements:

// Remove meta/link tags
html = html.replace(/<(META|LINK)[^>]*>\s*/gi, '' ) ;

// Remove style tags
html = html.replace( /<STYLE[^>]*>([\s\S]*?)<\/STYLE[^>]*>/gi, '' ) ;

Attachments

2272.patch Download (3.5 KB) - added by martinkou 21 months ago.
2272_2.patch Download (3.7 KB) - added by martinkou 21 months ago.

Change History

Changed 21 months ago by w.olchawa

  • keywords Confirmed Firefox3 added
  • version set to FCKeditor 2.6.2

Confirmed using FCKeditor 2.6.2 and the latest SVN version in Firefox3

Changed 21 months ago by fredck

  • milestone set to FCKeditor 2.6.3

#2291 has been marked as DUP. The following regex has been also proposed there:

html = html.replace( /<w:[^>]*>[\s\S]*?<\/w:[^>]*>/gi, '' ) ;

Also, many of the proposed suggestions are defining capturing groups, like ([\s\S]*?), which impact on performance. They can be avoided in many cases, and if the grouping is needed but not really to be captured the (?:) syntax is to be used.

Changed 21 months ago by martinkou

  • owner set to martinkou
  • status changed from new to assigned

Changed 21 months ago by martinkou

Changed 21 months ago by martinkou

  • keywords Review? added

Changed 21 months ago by fredck

  • keywords Review- added; Review? removed

The regex at comment:2 should also be considered, shouldn't it?

Changed 21 months ago by martinkou

Changed 21 months ago by martinkou

  • keywords Review? added; Review- removed

Changed 21 months ago by fredck

  • keywords Review+ added; Review? removed

Changed 21 months ago by martinkou

  • status changed from assigned to closed
  • resolution set to fixed

Fixed with [2174].

Click here for more info about our SVN system.

Note: See TracTickets for help on using tickets.