newbie raw text question
Ian Sparks
Ian.Sparks at etrials.com
Tue Feb 4 09:44:26 EST 2003
More information about the Python-list mailing list
Tue Feb 4 09:44:26 EST 2003
- Previous message (by thread): newbie raw text question
- Next message (by thread): newbie raw text question
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thanks for the reply Dennis. Your breakdown of the meaning of the RTF codes is pretty-much spot on. However, I'm still not "getting it". You say : >> What escaped characters? The \ is a tag introducer (for lack of a better word) and is part of the actual data. "\rtf1" is NOT <cr>tf1. << So here's a simple command-line test : >>> print "\rtf1" tf1 >>> print r"\rtf1" \rtf1 >>> Looks to me like \rtf1 *is* <cr>tf1 unless you define the string as a raw string and then it can contain the "\" character. This is all very well for strings you define at the command line but what if a variable "x" contains "\rtf1" (NOT a raw string). Now how can you deal with it? >>> print x tf1 >>> print rx #attempt to turn x into a raw string for printing. Traceback (most recent call last): File "<interactive input>", line 1, in ? NameError: name 'rx' is not defined >>> How can I print x as though it were a raw string? Like I said, its probably pretty obvious, I just don't "get it". -----Original Message----- From: Dennis Lee Bieber [mailto:wlfraed at ix.netcom.com] Sent: Monday, February 03, 2003 11:33 PM To: python-list at python.org Subject: Re: newbie raw text question Ian Sparks fed this fish to the penguins on Monday 03 February 2003 12:11 pm: > I'm confused about this one. I'm reading some RTF formatted data from > a database. The resulting string is : > > {\rtf1\ansi\ansicpg1252\deff0\deftab720{\fonttbl{\f0\fswiss MS Sans > {Serif;}{\f1\froman\fcharset2 Symbol;}{\f2\fswiss Arial;}{\f3\fswiss > {Arial;}} \colortbl\red0\green0\blue0;} > \deflang1033\pard\plain\f3\fs16 Some text > } > > obviously this is chock-full of escaped characters. I need to strip > the RTF codes and all my regular expressions are expecting raw strings > but I don't see a way of converting an escaped string to a raw string > to use in the regex. > What escaped characters? The \ is a tag introducer (for lack of a better word) and is part of the actual data. "\rtf1" is NOT <cr>tf1. What I see in your sample (and I've not studied RFT) is: RTF version 1 (hypothetical this) ANSI Codepage 1252 define font 0 (guessing) define tab 720 decipoints (1inch)(guessing, might be centipoints/0.1inch) font table font 0 "swiss" font (san serif) is MS San Serif font 1 "roman" font (serif) is character set 2 Symbol font 2 "swiss" font is Arial font 3 "swiss" font is Arial color table red 0 green 0 blue 0 define language 1033 ???? plain (not bold or italic) use font 3 font size 16 > There must be some way out of here... > > > -- > ============================================================== < > wlfraed at ix.netcom.com | Wulfraed Dennis Lee Bieber KD6MOG < > wulfraed at dm.net | Bestiaria Support Staff < > ============================================================== < > Bestiaria Home Page: http://www.beastie.dm.net/ < > Home Page: http://www.dm.net/~wulfraed/ < -- http://mail.python.org/mailman/listinfo/python-list
- Previous message (by thread): newbie raw text question
- Next message (by thread): newbie raw text question
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list