Bug 64495 - : Writer 4 cannot find regular expressions like \xAD or \x00AB
Summary: : Writer 4 cannot find regular expressions like \xAD or \x00AB
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version: 4.0.0.3 release
Hardware: Other Linux (All)
: medium normal
Assignee: Michael Stahl
QA Contact:
URL:
Whiteboard: BSA target:4.2.0 target:4.1.4 target:...
Keywords: regression
: 63261 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-05-12 15:32 UTC by AndreHasekamp
Modified: 2013-11-16 13:42 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description AndreHasekamp 2013-05-12 15:32:39 UTC
Problem description: 

Steps to reproduce:
1. Open any .odt document which contains one or more custom-hyphens (soft-hyphens) in LibreOffice 4 or higher (e.g. LibreOffice 4.0.3.3).
2. Do Edit | Find & Replace. Search for: \x00AD. More options: turn the checkbox "Regular Expressions" on.
3. Press Find.

Current behavior: No custom-hyphens will be found.

Expected behavior: Custom-hyphens should be found, as e.g. in Writer 3.5.1.

(In practice I have a little BASIC program which removes soft-hyphens from a document, but which will not work since LibreOffice 4; so, for the time being I have to do Tools | Options |LibreOffic Writer | Formatting Aids and turn the checkbox "Custom hyphens" off).

In a previous version of the Bug Assistant I could browse through the bugs to see if the same bug or a similar bug had already been submitted. Sorry, I could not find this opportunity now, so I simply had to submit the bug.

Kind regards,
Andre Hasekamp.


              
Operating System: Linux (Other)
Version: 4.0.3.3 release
Last worked in: 3.5.1 release
Comment 1 Urmas 2013-05-12 16:07:30 UTC
I don't remember that \x combinations have ever worked.
Comment 2 GerardF 2013-05-12 19:05:52 UTC
(In reply to comment #1)
> I don't remember that \x combinations have ever worked.

It work with OOo and with LibreOffice until 4.0.x (x?)
Works with 3.6.6, fail with 4.0.1
Comment 3 Jacques Guilleron 2013-05-12 20:59:37 UTC
Hi,

ICU regexp engine is a new feature for LO 4, which replace the custom engine. See:
http://www.libreoffice.org/download/4-0-new-features-and-fixes/ in Options/General, where can be found: 
http://userguide.icu-project.org/strings/regexp#TOC-Regular-Expression-Metacharacters.

Have a nice day,
Jacques Guilleron
Comment 4 Jacques Guilleron 2013-05-13 09:14:58 UTC
A difference with hexa values for find and replace: If now (in LO 4.0.2.2) I enter \x00AB, the character will not be found, but \xAB will. Unfornutatly, for \xAD, that don't work. I can find it only if I enter the character directly. There is perhaps others diffenrences.

Jacques Guilleron
Comment 5 AndreHasekamp 2013-05-13 23:11:31 UTC
Hi,

I'm afraid I'll have to study this ICU document first; never seen it before.

So, consider this bug 64495 withdrawn.

Kind regards,
Andre Hasekamp.

 

 

 

-----Original Message-----
From: bugzilla-daemon <bugzilla-daemon@freedesktop.org>
To: AndreHasekamp <AndreHasekamp@netscape.net>
Sent: Sun, May 12, 2013 10:59 pm
Subject: [Bug 64495] : Writer 4 cannot find regular expressions


                  
      
        
            Comment # 3              on bug 64495              from " Jacques Guilleron        
Hi,

ICU regexp engine is a new feature for LO 4, which replace the custom engine.
See:
http://www.libreoffice.org/download/4-0-new-features-and-fixes/ in
Options/General, where can be found: 
http://userguide.icu-project.org/strings/regexp#TOC-Regular-Expression-Metacharacters.

Have a nice day,
Jacques Guilleron
        
      
      
      You are receiving this mail because:            
          
You reported the bug.
Comment 6 Michael Stahl 2013-05-17 13:58:09 UTC
Eike, is there any problem here with "\x..." regex search that needs fixing?
Comment 7 Eike Rathke 2013-05-17 18:44:05 UTC
Help needs to be updated.. The ICU regular expressions are slightly different in details from the home-brewed OOo expressions, for this example the four hex digits following the \x are not accepted, \x accepts only two hex digits for values <=255, so \xhh. More hex digits (1-6) are accepted in the form \x{hhhhhh}. These two forms are actually identical with Perl regular expressions. As an ICU Unicode extension also the form \uhhhh with exactly four hex digits can be used, or \Uhhhhhh with exactly six hex digits. For more details see the mentioned metacharacters URL http://userguide.icu-project.org/strings/regexp#TOC-Regular-Expression-Metacharacters

For the \xAD that according to comment 4 does not work I'm not sure, is a soft-hyphen even part of the text? Isn't it only generated by word breaking? Does \u00AD find it? On the other hand I spotted some special treatment of 0x00AD in sw/source/core/crsr/findtxt.cxx SwPaM::DoSearch() that for if bRegSearch is supposed to set bRemoveSoftHyphens = false;
Comment 8 Michael Stahl 2013-10-11 20:09:00 UTC
*** Bug 63261 has been marked as a duplicate of this bug. ***
Comment 9 Commit Notification 2013-10-12 00:17:07 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=dca5163b6ef206ceb1f2d56feb7546c1929afe60

fdo#64495: sw: fix regex search for soft hyphen \xAD



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 10 Commit Notification 2013-10-14 10:59:49 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-4-1":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=386d0c5d663fe50295be3714977a54b86212f766&h=libreoffice-4-1

fdo#64495: sw: fix regex search for soft hyphen \xAD


It will be available in LibreOffice 4.1.4.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2013-10-14 11:11:38 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-4-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=6add0104e250fd8653a93450d371404aa3ff3a6c&h=libreoffice-4-0

fdo#64495: sw: fix regex search for soft hyphen \xAD


It will be available in LibreOffice 4.0.7.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 12 Commit Notification 2013-10-14 11:12:56 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-4-0-6":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=730c5696c6c668c88ed071fed6f3598f0b4a2aa1&h=libreoffice-4-0-6

fdo#64495: sw: fix regex search for soft hyphen \xAD


It will be available already in LibreOffice 4.0.6.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 13 Michael Stahl 2013-10-21 18:07:16 UTC
so it turned out that the soft-hyphen (\xAD) needs special handling
code in Writer, which is fixed now.

you can use any of the ICU supported Unicode literal syntax, e.g.
 \xAD
 \x{00AD}
 \u00AD
 \U000000AD
 \N{SOFT HYPHEN}

but the legacy syntax \x00AD is no longer supported and that
will not be fixed.

have now adapted the help content on master accordingly to document \uXXXX.
Comment 14 Commit Notification 2013-10-21 18:16:52 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/help/commit/?id=f81edbd66fc4d0b6cf03949bb2339c9be9ee989c

fdo#64495: help: regex \xXXXX is no longer supported



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 15 Jacques Guilleron 2013-11-16 07:22:29 UTC
Hello,

Verified with LO 4.2.0.0.alpha0+
Build ID: 71e1c79acebab5fc6a31457416c24c4a33141c33
TinderBox: Win-x86@42, Branch:master, Time: 2013-10-27_23:53:26

Thank you Michael for time passed on fixing this.

Jacques
Comment 16 foss 2013-11-16 13:42:02 UTC
setting to verified as of comment #15.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.