Initial class construction

This commit is contained in:
João Narciso
2019-05-06 16:34:28 +02:00
parent 67f2d57e03
commit 431ff5f7d4
5813 changed files with 1622108 additions and 0 deletions

View File

@ -0,0 +1,672 @@
package MIME::Body;
=head1 NAME
MIME::Body - the body of a MIME message
=head1 SYNOPSIS
Before reading further, you should see L<MIME::Tools> to make sure that
you understand where this module fits into the grand scheme of things.
Go on, do it now. I'll wait.
Ready? Ok...
=head2 Obtaining bodies
### Get the bodyhandle of a MIME::Entity object:
$body = $entity->bodyhandle;
### Create a body which stores data in a disk file:
$body = new MIME::Body::File "/path/to/file";
### Create a body which stores data in an in-core array:
$body = new MIME::Body::InCore \@strings;
=head2 Opening, closing, and using IO handles
### Write data to the body:
$IO = $body->open("w") || die "open body: $!";
$IO->print($message);
$IO->close || die "close I/O handle: $!";
### Read data from the body (in this case, line by line):
$IO = $body->open("r") || die "open body: $!";
while (defined($_ = $IO->getline)) {
### do stuff
}
$IO->close || die "close I/O handle: $!";
=head2 Other I/O
### Dump the ENCODED body data to a filehandle:
$body->print(\*STDOUT);
### Slurp all the UNENCODED data in, and put it in a scalar:
$string = $body->as_string;
### Slurp all the UNENCODED data in, and put it in an array of lines:
@lines = $body->as_lines;
=head2 Working directly with paths to underlying files
### Where's the data?
if (defined($body->path)) { ### data is on disk:
print "data is stored externally, in ", $body->path;
}
else { ### data is in core:
print "data is already in core, and is...\n", $body->as_string;
}
### Get rid of anything on disk:
$body->purge;
=head1 DESCRIPTION
MIME messages can be very long (e.g., tar files, MPEGs, etc.) or very
short (short textual notes, as in ordinary mail). Long messages
are best stored in files, while short ones are perhaps best stored
in core.
This class is an attempt to define a common interface for objects
which contain message data, regardless of how the data is
physically stored. The lifespan of a "body" object
usually looks like this:
=over 4
=item 1.
B<Body object is created by a MIME::Parser during parsing.>
It's at this point that the actual MIME::Body subclass is chosen,
and new() is invoked. (For example: if the body data is going to
a file, then it is at this point that the class MIME::Body::File,
and the filename, is chosen).
=item 2.
B<Data is written to the body> (usually by the MIME parser) like this:
The body is opened for writing, via C<open("w")>. This will trash any
previous contents, and return an "I/O handle" opened for writing.
Data is written to this I/O handle, via print().
Then the I/O handle is closed, via close().
=item 3.
B<Data is read from the body> (usually by the user application) like this:
The body is opened for reading by a user application, via C<open("r")>.
This will return an "I/O handle" opened for reading.
Data is read from the I/O handle, via read(), getline(), or getlines().
Then the I/O handle is closed, via close().
=item 4.
B<Body object is destructed.>
=back
You can write your own subclasses, as long as they follow the
interface described below. Implementers of subclasses should assume
that steps 2 and 3 may be repeated any number of times, and in
different orders (e.g., 1-2-2-3-2-3-3-3-3-3-2-4).
In any case, once a MIME::Body has been created, you ask to open it
for reading or writing, which gets you an "i/o handle": you then use
the same mechanisms for reading from or writing to that handle, no matter
what class it is.
Beware: unless you know for certain what kind of body you have, you
should I<not> assume that the body has an underlying filehandle.
=head1 PUBLIC INTERFACE
=over 4
=cut
### Pragmas:
use strict;
use vars qw($VERSION);
### System modules:
use Carp;
use IO::File;
### The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
#------------------------------
=item new ARGS...
I<Class method, constructor.>
Create a new body. Any ARGS are sent to init().
=cut
sub new {
my $self = bless {}, shift;
$self->init(@_);
$self;
}
#------------------------------
=item init ARGS...
I<Instance method, abstract, initiallizer.>
This is called automatically by C<new()>, with the arguments given
to C<new()>. The arguments are optional, and entirely up to the
subclass. The default method does nothing,
=cut
sub init { 1 }
#------------------------------
=item as_lines
I<Instance method.>
Return the contents of the body as an array of lines (each terminated
by a newline, with the possible exception of the final one).
Returns empty on failure (NB: indistinguishable from an empty body!).
Note: the default method gets the data via
repeated getline() calls; your subclass might wish to override this.
=cut
sub as_lines {
my $self = shift;
my @lines;
my $io = $self->open("r") || return ();
local $_;
push @lines, $_ while (defined($_ = $io->getline()));
$io->close;
@lines;
}
#------------------------------
=item as_string
I<Instance method.>
Return the body data as a string (slurping it into core if necessary).
Best not to do this unless you're I<sure> that the body is reasonably small!
Returns empty string for an empty body, and undef on failure.
Note: the default method uses print(), which gets the data via
repeated read() calls; your subclass might wish to override this.
=cut
sub as_string {
my $self = shift;
my $str = '';
my $fh = IO::File->new(\$str, '>:') or croak("Cannot open in-memory file: $!");
$self->print($fh);
close($fh);
return $str;
}
*data = \&as_string; ### silently invoke preferred usage
#------------------------------
=item binmode [ONOFF]
I<Instance method.>
With argument, flags whether or not open() should return an I/O handle
which has binmode() activated. With no argument, just returns the
current value.
=cut
sub binmode {
my ($self, $onoff) = @_;
$self->{MB_Binmode} = $onoff if (@_ > 1);
$self->{MB_Binmode};
}
#------------------------------
=item is_encoded [ONOFF]
I<Instance method.>
If set to yes, no decoding is applied on output. This flag is set
by MIME::Parser, if the parser runs in decode_bodies(0) mode, so the
content is handled unmodified.
=cut
sub is_encoded {
my ($self, $yesno) = @_;
$self->{MB_IsEncoded} = $yesno if (@_ > 1);
$self->{MB_IsEncoded};
}
#------------------------------
=item dup
I<Instance method.>
Duplicate the bodyhandle.
I<Beware:> external data in bodyhandles is I<not> copied to new files!
Changing the data in one body's data file, or purging that body,
I<will> affect its duplicate. Bodies with in-core data probably need
not worry.
=cut
sub dup {
my $self = shift;
bless { %$self }, ref($self); ### shallow copy ok for ::File and ::Scalar
}
#------------------------------
=item open READWRITE
I<Instance method, abstract.>
This should do whatever is necessary to open the body for either
writing (if READWRITE is "w") or reading (if mode is "r").
This method is expected to return an "I/O handle" object on success,
and undef on error. An I/O handle can be any object that supports a
small set of standard methods for reading/writing data.
See the IO::Handle class for an example.
=cut
sub open {
undef;
}
#------------------------------
=item path [PATH]
I<Instance method.>
If you're storing the body data externally (e.g., in a disk file), you'll
want to give applications the ability to get at that data, for cleanup.
This method should return the path to the data, or undef if there is none.
Where appropriate, the path I<should> be a simple string, like a filename.
With argument, sets the PATH, which should be undef if there is none.
=cut
sub path {
my $self = shift;
$self->{MB_Path} = shift if @_;
$self->{MB_Path};
}
#------------------------------
=item print FILEHANDLE
I<Instance method.>
Output the body data to the given filehandle, or to the currently-selected
one if none is given.
=cut
sub print {
my ($self, $fh) = @_;
my $nread;
### Get output filehandle, and ensure that it's a printable object:
$fh ||= select;
### Write it:
my $buf = '';
my $io = $self->open("r") || return undef;
$fh->print($buf) while ($nread = $io->read($buf, 8192));
$io->close;
return defined($nread); ### how'd we do?
}
#------------------------------
=item purge
I<Instance method, abstract.>
Remove any data which resides external to the program (e.g., in disk files).
Immediately after a purge(), the path() should return undef to indicate
that the external data is no longer available.
=cut
sub purge {
1;
}
=back
=head1 SUBCLASSES
The following built-in classes are provided:
Body Stores body When open()ed,
class: data in: returns:
--------------------------------------------------------
MIME::Body::File disk file IO::Handle
MIME::Body::Scalar scalar IO::Handle
MIME::Body::InCore scalar array IO::Handle
=cut
#------------------------------------------------------------
package MIME::Body::File;
#------------------------------------------------------------
=head2 MIME::Body::File
A body class that stores the data in a disk file. Invoke the
constructor as:
$body = new MIME::Body::File "/path/to/file";
In this case, the C<path()> method would return the given path,
so you I<could> say:
if (defined($body->path)) {
open BODY, $body->path or die "open: $!";
while (<BODY>) {
### do stuff
}
close BODY;
}
But you're best off not doing this.
=cut
### Pragmas:
use vars qw(@ISA);
use strict;
### System modules:
use IO::File;
### Kit modules:
use MIME::Tools qw(whine);
@ISA = qw(MIME::Body);
#------------------------------
# init PATH
#------------------------------
sub init {
my ($self, $path) = @_;
$self->path($path); ### use it as-is
$self;
}
#------------------------------
# open READWRITE
#------------------------------
sub open {
my ($self, $mode) = @_;
my $path = $self->path;
if( $mode ne 'r' && $mode ne 'w' ) {
die "bad mode: '$mode'";
}
my $IO = IO::File->new($path, $mode) || die "MIME::Body::File->open $path: $!";
$IO->binmode() if $self->binmode;
return $IO;
}
#------------------------------
# purge
#------------------------------
# Unlink the path (and undefine it).
#
sub purge {
my $self = shift;
if (defined($self->path)) {
unlink $self->path or whine "couldn't unlink ".$self->path.": $!";
$self->path(undef);
}
1;
}
#------------------------------------------------------------
package MIME::Body::Scalar;
#------------------------------------------------------------
=head2 MIME::Body::Scalar
A body class that stores the data in-core, in a simple scalar.
Invoke the constructor as:
$body = new MIME::Body::Scalar \$string;
A single scalar argument sets the body to that value, exactly as though
you'd opened for the body for writing, written the value,
and closed the body again:
$body = new MIME::Body::Scalar "Line 1\nLine 2\nLine 3";
A single array reference sets the body to the result of joining all the
elements of that array together:
$body = new MIME::Body::Scalar ["Line 1\n",
"Line 2\n",
"Line 3"];
=cut
use vars qw(@ISA);
use strict;
use Carp;
@ISA = qw(MIME::Body);
#------------------------------
# init DATA
#------------------------------
sub init {
my ($self, $data) = @_;
$data = join('', @$data) if (ref($data) && (ref($data) eq 'ARRAY'));
$self->{MBS_Data} = (defined($data) ? $data : '');
$self;
}
#------------------------------
# as_string
#------------------------------
sub as_string {
shift->{MBS_Data};
}
#------------------------------
# open READWRITE
#------------------------------
sub open {
my ($self, $mode) = @_;
$self->{MBS_Data} = '' if ($mode eq 'w'); ### writing
if ($mode eq 'w') {
$mode = '>:';
} elsif ($mode eq 'r') {
$mode = '<:';
} else {
die "bad mode: $mode";
}
return IO::File->new(\ $self->{MBS_Data}, $mode);
}
#------------------------------------------------------------
package MIME::Body::InCore;
#------------------------------------------------------------
=head2 MIME::Body::InCore
A body class that stores the data in-core.
Invoke the constructor as:
$body = new MIME::Body::InCore \$string;
$body = new MIME::Body::InCore $string;
$body = new MIME::Body::InCore \@stringarray
A simple scalar argument sets the body to that value, exactly as though
you'd opened for the body for writing, written the value,
and closed the body again:
$body = new MIME::Body::InCore "Line 1\nLine 2\nLine 3";
A single array reference sets the body to the concatenation of all
scalars that it holds:
$body = new MIME::Body::InCore ["Line 1\n",
"Line 2\n",
"Line 3"];
=cut
use vars qw(@ISA);
use strict;
use Carp;
@ISA = qw(MIME::Body::Scalar);
#------------------------------
# init DATA
#------------------------------
sub init {
my ($self, $data) = @_;
if (!defined($data)) { ### nothing
$self->{MBS_Data} = '';
}
elsif (!ref($data)) { ### simple scalar
$self->{MBS_Data} = $data;
}
elsif (ref($data) eq 'SCALAR') {
$self->{MBS_Data} = $$data;
}
elsif (ref($data) eq 'ARRAY') {
$self->{MBS_Data} = join('', @$data);
}
else {
croak "I can't handle DATA which is a ".ref($data)."\n";
}
$self;
}
1;
__END__
#------------------------------
=head2 Defining your own subclasses
So you're not happy with files and scalar-arrays?
No problem: just define your own MIME::Body subclass, and make a subclass
of MIME::Parser or MIME::ParserBase which returns an instance of your
body class whenever appropriate in the C<new_body_for(head)> method.
Your "body" class must inherit from MIME::Body (or some subclass of it),
and it must either provide (or inherit the default for) the following
methods...
The default inherited method I<should suffice> for all these:
new
binmode [ONOFF]
path
The default inherited method I<may suffice> for these, but perhaps
there's a better implementation for your subclass.
init ARGS...
as_lines
as_string
dup
print
purge
The default inherited method I<will probably not suffice> for these:
open
=head1 NOTES
One reason I didn't just use IO::Handle objects for message bodies was
that I wanted a "body" object to be a form of completely encapsulated
program-persistent storage; that is, I wanted users to be able to write
code like this...
### Get body handle from this MIME message, and read its data:
$body = $entity->bodyhandle;
$IO = $body->open("r");
while (defined($_ = $IO->getline)) {
print STDOUT $_;
}
$IO->close;
...without requiring that they know anything more about how the
$body object is actually storing its data (disk file, scalar variable,
array variable, or whatever).
Storing the body of each MIME message in a persistently-open
IO::Handle was a possibility, but it seemed like a bad idea,
considering that a single multipart MIME message could easily suck up
all the available file descriptors on some systems. This risk increases
if the user application is processing more than one MIME entity at a time.
=head1 SEE ALSO
L<MIME::Tools>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
David F. Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com
All rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
Thanks to Achim Bohnet for suggesting that MIME::Parser not be restricted
to the use of FileHandles.
#------------------------------
1;

View File

@ -0,0 +1,661 @@
package MIME::Decoder;
=head1 NAME
MIME::Decoder - an object for decoding the body part of a MIME stream
=head1 SYNOPSIS
Before reading further, you should see L<MIME::Tools> to make sure that
you understand where this module fits into the grand scheme of things.
Go on, do it now. I'll wait.
Ready? Ok...
=head2 Decoding a data stream
Here's a simple filter program to read quoted-printable data from STDIN
(until EOF) and write the decoded data to STDOUT:
use MIME::Decoder;
$decoder = new MIME::Decoder 'quoted-printable' or die "unsupported";
$decoder->decode(\*STDIN, \*STDOUT);
=head2 Encoding a data stream
Here's a simple filter program to read binary data from STDIN
(until EOF) and write base64-encoded data to STDOUT:
use MIME::Decoder;
$decoder = new MIME::Decoder 'base64' or die "unsupported";
$decoder->encode(\*STDIN, \*STDOUT);
=head2 Non-standard encodings
You can B<write and install> your own decoders so that
MIME::Decoder will know about them:
use MyBase64Decoder;
install MyBase64Decoder 'base64';
You can also B<test> if a given encoding is supported:
if (supported MIME::Decoder 'x-uuencode') {
### we can uuencode!
}
=head1 DESCRIPTION
This abstract class, and its private concrete subclasses (see below)
provide an OO front end to the actions of...
=over 4
=item *
Decoding a MIME-encoded stream
=item *
Encoding a raw data stream into a MIME-encoded stream.
=back
The constructor for MIME::Decoder takes the name of an encoding
(C<base64>, C<7bit>, etc.), and returns an instance of a I<subclass>
of MIME::Decoder whose C<decode()> method will perform the appropriate
decoding action, and whose C<encode()> method will perform the appropriate
encoding action.
=cut
### Pragmas:
use strict;
use vars qw($VERSION %DecoderFor);
### System modules:
use IPC::Open2;
use IO::Select;
use FileHandle;
### Kit modules:
use MIME::Tools qw(:config :msgs);
use Carp;
#------------------------------
#
# Globals
#
#------------------------------
### The stream decoders:
%DecoderFor = (
### Standard...
'7bit' => 'MIME::Decoder::NBit',
'8bit' => 'MIME::Decoder::NBit',
'base64' => 'MIME::Decoder::Base64',
'binary' => 'MIME::Decoder::Binary',
'none' => 'MIME::Decoder::Binary',
'quoted-printable' => 'MIME::Decoder::QuotedPrint',
### Non-standard...
'binhex' => 'MIME::Decoder::BinHex',
'binhex40' => 'MIME::Decoder::BinHex',
'mac-binhex40' => 'MIME::Decoder::BinHex',
'mac-binhex' => 'MIME::Decoder::BinHex',
'x-uu' => 'MIME::Decoder::UU',
'x-uuencode' => 'MIME::Decoder::UU',
### This was removed, since I fear that x-gzip != x-gzip64...
### 'x-gzip' => 'MIME::Decoder::Gzip64',
### This is no longer installed by default, since not all folks have gzip:
### 'x-gzip64' => 'MIME::Decoder::Gzip64',
);
### The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
### Me:
my $ME = 'MIME::Decoder';
#------------------------------
=head1 PUBLIC INTERFACE
=head2 Standard interface
If all you are doing is I<using> this class, here's all you'll need...
=over 4
=cut
#------------------------------
=item new ENCODING
I<Class method, constructor.>
Create and return a new decoder object which can handle the
given ENCODING.
my $decoder = new MIME::Decoder "7bit";
Returns the undefined value if no known decoders are appropriate.
=cut
sub new {
my ($class, @args) = @_;
my ($encoding) = @args;
### Coerce the type to be legit:
$encoding = lc($encoding || '');
### Get the class:
my $concrete_name = $DecoderFor{$encoding};
if( ! $concrete_name ) {
carp "no decoder for $encoding";
return undef;
}
### Create the new object (if we can):
my $self = { MD_Encoding => lc($encoding) };
unless (eval "require $concrete_name;") {
carp $@;
return undef;
}
bless $self, $concrete_name;
$self->init(@args);
}
#------------------------------
=item best ENCODING
I<Class method, constructor.>
Exactly like new(), except that this defaults any unsupported encoding to
"binary", after raising a suitable warning (it's a fatal error if there's
no binary decoder).
my $decoder = best MIME::Decoder "x-gzip64";
Will either return a decoder, or a raise a fatal exception.
=cut
sub best {
my ($class, $enc, @args) = @_;
my $self = $class->new($enc, @args);
if (!$self) {
usage "unsupported encoding '$enc': using 'binary'";
$self = $class->new('binary') || croak "ack! no binary decoder!";
}
$self;
}
#------------------------------
=item decode INSTREAM,OUTSTREAM
I<Instance method.>
Decode the document waiting in the input handle INSTREAM,
writing the decoded information to the output handle OUTSTREAM.
Read the section in this document on I/O handles for more information
about the arguments. Note that you can still supply old-style
unblessed filehandles for INSTREAM and OUTSTREAM.
Returns true on success, throws exception on failure.
=cut
sub decode {
my ($self, $in, $out) = @_;
### Set up the default input record separator to be CRLF:
### $in->input_record_separator("\012\015");
### Invoke back-end method to do the work:
$self->decode_it($in, $out) ||
die "$ME: ".$self->encoding." decoding failed\n";
1;
}
#------------------------------
=item encode INSTREAM,OUTSTREAM
I<Instance method.>
Encode the document waiting in the input filehandle INSTREAM,
writing the encoded information to the output stream OUTSTREAM.
Read the section in this document on I/O handles for more information
about the arguments. Note that you can still supply old-style
unblessed filehandles for INSTREAM and OUTSTREAM.
Returns true on success, throws exception on failure.
=cut
sub encode {
my ($self, $in, $out, $textual_type) = @_;
### Invoke back-end method to do the work:
$self->encode_it($in, $out, $self->encoding eq 'quoted-printable' ? ($textual_type) : ()) ||
die "$ME: ".$self->encoding." encoding failed\n";
}
#------------------------------
=item encoding
I<Instance method.>
Return the encoding that this object was created to handle,
coerced to all lowercase (e.g., C<"base64">).
=cut
sub encoding {
shift->{MD_Encoding};
}
#------------------------------
=item head [HEAD]
I<Instance method.>
Completely optional: some decoders need to know a little about the file
they are encoding/decoding; e.g., x-uu likes to have the filename.
The HEAD is any object which responds to messages like:
$head->mime_attr('content-disposition.filename');
=cut
sub head {
my ($self, $head) = @_;
$self->{MD_Head} = $head if @_ > 1;
$self->{MD_Head};
}
#------------------------------
=item supported [ENCODING]
I<Class method.>
With one arg (an ENCODING name), returns truth if that encoding
is currently handled, and falsity otherwise. The ENCODING will
be automatically coerced to lowercase:
if (supported MIME::Decoder '7BIT') {
### yes, we can handle it...
}
else {
### drop back six and punt...
}
With no args, returns a reference to a hash of all available decoders,
where the key is the encoding name (all lowercase, like '7bit'),
and the value is true (it happens to be the name of the class
that handles the decoding, but you probably shouldn't rely on that).
You may safely modify this hash; it will I<not> change the way the
module performs its lookups. Only C<install> can do that.
I<Thanks to Achim Bohnet for suggesting this method.>
=cut
sub supported {
my ($class, $decoder) = @_;
defined($decoder) ? $DecoderFor{lc($decoder)}: { %DecoderFor };
}
#------------------------------
=back
=head2 Subclass interface
If you are writing (or installing) a new decoder subclass, there
are some other methods you'll need to know about:
=over 4
=item decode_it INSTREAM,OUTSTREAM
I<Abstract instance method.>
The back-end of the B<decode> method. It takes an input handle
opened for reading (INSTREAM), and an output handle opened for
writing (OUTSTREAM).
If you are writing your own decoder subclass, you must override this
method in your class. Your method should read from the input
handle via C<getline()> or C<read()>, decode this input, and print the
decoded data to the output handle via C<print()>. You may do this
however you see fit, so long as the end result is the same.
Note that unblessed references and globrefs are automatically turned
into I/O handles for you by C<decode()>, so you don't need to worry
about it.
Your method must return either C<undef> (to indicate failure),
or C<1> (to indicate success).
It may also throw an exception to indicate failure.
=cut
sub decode_it {
die "attempted to use abstract 'decode_it' method!";
}
=item encode_it INSTREAM,OUTSTREAM
I<Abstract instance method.>
The back-end of the B<encode> method. It takes an input handle
opened for reading (INSTREAM), and an output handle opened for
writing (OUTSTREAM).
If you are writing your own decoder subclass, you must override this
method in your class. Your method should read from the input
handle via C<getline()> or C<read()>, encode this input, and print the
encoded data to the output handle via C<print()>. You may do this
however you see fit, so long as the end result is the same.
Note that unblessed references and globrefs are automatically turned
into I/O handles for you by C<encode()>, so you don't need to worry
about it.
Your method must return either C<undef> (to indicate failure),
or C<1> (to indicate success).
It may also throw an exception to indicate failure.
=cut
sub encode_it {
die "attempted to use abstract 'encode_it' method!";
}
=item filter IN, OUT, COMMAND...
I<Class method, utility.>
If your decoder involves an external program, you can invoke
them easily through this method. The command must be a "filter": a
command that reads input from its STDIN (which will come from the IN argument)
and writes output to its STDOUT (which will go to the OUT argument).
For example, here's a decoder that un-gzips its data:
sub decode_it {
my ($self, $in, $out) = @_;
$self->filter($in, $out, "gzip -d -");
}
The usage is similar to IPC::Open2::open2 (which it uses internally),
so you can specify COMMAND as a single argument or as an array.
=cut
sub filter
{
my ($self, $in, $out, @cmd) = @_;
my $buf = '';
### Open pipe:
STDOUT->flush; ### very important, or else we get duplicate output!
my $kidpid = open2(my $child_out, my $child_in, @cmd) || die "@cmd: open2 failed: $!";
### We have to use select() for doing both reading and writing.
my $rsel = IO::Select->new( $child_out );
my $wsel = IO::Select->new( $child_in );
while (1) {
### Wait for one hour; if that fails, it's too bad.
my ($read, $write) = IO::Select->select( $rsel, $wsel, undef, 3600);
if( !defined $read && !defined $write ) {
kill 1, $kidpid;
waitpid $kidpid, 0;
die "@cmd: select failed: $!";
}
### If can read from child:
if( my $fh = shift @$read ) {
if( $fh->sysread(my $buf, 1024) ) {
$out->print($buf);
} else {
$rsel->remove($fh);
$fh->close();
}
}
### If can write to child:
if( my $fh = shift @$write ) {
if($in->read(my $buf, 1024)) {
local $SIG{PIPE} = sub {
warn "got SIGPIPE from @cmd";
$wsel->remove($fh);
$fh->close();
};
$fh->syswrite( $buf );
} else {
$wsel->remove($fh);
$fh->close();
}
}
### If both $child_out and $child_in are done:
last unless ($rsel->count() || $wsel->count());
}
### Wait for it:
waitpid($kidpid, 0) == $kidpid or die "@cmd: couldn't reap child $kidpid";
### Check if it failed:
$? == 0 or die "@cmd: bad exit status: \$? = $?";
1;
}
#------------------------------
=item init ARGS...
I<Instance method.>
Do any necessary initialization of the new instance,
taking whatever arguments were given to C<new()>.
Should return the self object on success, undef on failure.
=cut
sub init {
$_[0];
}
#------------------------------
=item install ENCODINGS...
I<Class method>.
Install this class so that each encoding in ENCODINGS is handled by it:
install MyBase64Decoder 'base64', 'x-base64super';
You should not override this method.
=cut
sub install {
my $class = shift;
$DecoderFor{lc(shift @_)} = $class while (@_);
}
#------------------------------
=item uninstall ENCODINGS...
I<Class method>.
Uninstall support for encodings. This is a way to turn off the decoding
of "experimental" encodings. For safety, always use MIME::Decoder directly:
uninstall MIME::Decoder 'x-uu', 'x-uuencode';
You should not override this method.
=cut
sub uninstall {
shift;
$DecoderFor{lc(shift @_)} = undef while (@_);
}
1;
__END__
#------------------------------
=back
=head1 DECODER SUBCLASSES
You don't need to C<"use"> any other Perl modules; the
following "standard" subclasses are included as part of MIME::Decoder:
Class: Handles encodings:
------------------------------------------------------------
MIME::Decoder::Binary binary
MIME::Decoder::NBit 7bit, 8bit
MIME::Decoder::Base64 base64
MIME::Decoder::QuotedPrint quoted-printable
The following "non-standard" subclasses are also included:
Class: Handles encodings:
------------------------------------------------------------
MIME::Decoder::UU x-uu, x-uuencode
MIME::Decoder::Gzip64 x-gzip64 ** requires gzip!
=head1 NOTES
=head2 Input/Output handles
As of MIME-tools 2.0, this class has to play nice with the new MIME::Body
class... which means that input and output routines cannot just assume that
they are dealing with filehandles.
Therefore, all that MIME::Decoder and its subclasses require (and, thus,
all that they can assume) is that INSTREAMs and OUTSTREAMs are objects
which respond to a subset of the messages defined in the IO::Handle
interface; minimally:
print
getline
read(BUF,NBYTES)
I<Thanks to Achim Bohnet for suggesting this more-generic I/O model.>
=head2 Writing a decoder
If you're experimenting with your own encodings, you'll probably want
to write a decoder. Here are the basics:
=over 4
=item 1.
Create a module, like "MyDecoder::", for your decoder.
Declare it to be a subclass of MIME::Decoder.
=item 2.
Create the following instance methods in your class, as described above:
decode_it
encode_it
init
=item 3.
In your application program, activate your decoder for one or
more encodings like this:
require MyDecoder;
install MyDecoder "7bit"; ### use MyDecoder to decode "7bit"
install MyDecoder "x-foo"; ### also use MyDecoder to decode "x-foo"
=back
To illustrate, here's a custom decoder class for the C<quoted-printable>
encoding:
package MyQPDecoder;
@ISA = qw(MIME::Decoder);
use MIME::Decoder;
use MIME::QuotedPrint;
### decode_it - the private decoding method
sub decode_it {
my ($self, $in, $out) = @_;
local $_;
while (defined($_ = $in->getline)) {
my $decoded = decode_qp($_);
$out->print($decoded);
}
1;
}
### encode_it - the private encoding method
sub encode_it {
my ($self, $in, $out) = @_;
my ($buf, $nread) = ('', 0);
while ($in->read($buf, 60)) {
my $encoded = encode_qp($buf);
$out->print($encoded);
}
1;
}
That's it. The task was pretty simple because the C<"quoted-printable">
encoding can easily be converted line-by-line... as can
even C<"7bit"> and C<"8bit"> (since all these encodings guarantee
short lines, with a max of 1000 characters).
The good news is: it is very likely that it will be similarly-easy to
write a MIME::Decoder for any future standard encodings.
The C<"binary"> decoder, however, really required block reads and writes:
see L<"MIME::Decoder::Binary"> for details.
=head1 SEE ALSO
L<MIME::Tools>, other MIME::Decoder subclasses.
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
All rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
1;

View File

@ -0,0 +1,137 @@
package MIME::Decoder::Base64;
use strict;
use warnings;
=head1 NAME
MIME::Decoder::Base64 - encode/decode a "base64" stream
=head1 SYNOPSIS
A generic decoder object; see L<MIME::Decoder> for usage.
=head1 DESCRIPTION
A L<MIME::Decoder> subclass for the C<"base64"> encoding.
The name was chosen to jibe with the pre-existing MIME::Base64
utility package, which this class actually uses to translate each chunk.
=over 4
=item *
When B<decoding>, the input is read one line at a time.
The input accumulates in an internal buffer, which is decoded in
multiple-of-4-sized chunks (plus a possible "leftover" input chunk,
of course).
=item *
When B<encoding>, the input is read 45 bytes at a time: this ensures
that the output lines are not too long. We chose 45 since it is
a multiple of 3 and produces lines under 76 characters, as RFC 2045
specifies:
The encoded output stream must be represented in lines of no more
than 76 characters each.
=back
=head1 SEE ALSO
L<MIME::Decoder>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
All rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
=cut
use vars qw(@ISA $VERSION);
use MIME::Decoder;
use MIME::Base64 2.04;
use MIME::Tools qw(debug);
@ISA = qw(MIME::Decoder);
### The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
### How many bytes to encode at a time (must be a multiple of 3, and
### less than (76 * 0.75)!
my $EncodeChunkLength = 45;
### How many bytes to decode at a time?
my $DecodeChunkLength = 32 * 1024;
#------------------------------
#
# decode_it IN, OUT
#
sub decode_it {
my ($self, $in, $out) = @_;
my $len_4xN;
### Create a suitable buffer:
my $buffer = ' ' x (120 + $DecodeChunkLength); $buffer = '';
debug "in = $in; out = $out";
### Get chunks until done:
local($_) = ' ' x $DecodeChunkLength;
while ($in->read($_, $DecodeChunkLength)) {
tr{A-Za-z0-9+/}{}cd; ### get rid of non-base64 chars
### Concat any new input onto any leftover from the last round:
$buffer .= $_;
length($buffer) >= $DecodeChunkLength or next;
### Extract substring with highest multiple of 4 bytes:
### 0 means not enough to work with... get more data!
$len_4xN = length($buffer) & ~3;
### Partition into largest-multiple-of-4 (which we decode),
### and the remainder (which gets handled next time around):
$out->print(decode_base64(substr($buffer, 0, $len_4xN)));
$buffer = substr($buffer, $len_4xN);
}
### No more input remains. Dispose of anything left in buffer:
if (length($buffer)) {
### Pad to 4-byte multiple, and decode:
$buffer .= "==="; ### need no more than 3 pad chars
$len_4xN = length($buffer) & ~3;
### Decode it!
$out->print(decode_base64(substr($buffer, 0, $len_4xN)));
}
1;
}
#------------------------------
#
# encode_it IN, OUT
#
sub encode_it {
my ($self, $in, $out) = @_;
my $encoded;
my $nread;
my $buf = '';
my $nl = $MIME::Entity::BOUNDARY_DELIMITER || "\n";
while ($nread = $in->read($buf, $EncodeChunkLength)) {
$encoded = encode_base64($buf, $nl);
$encoded .= $nl unless ($encoded =~ /$nl\Z/); ### ensure newline!
$out->print($encoded);
}
1;
}
#------------------------------
1;

View File

@ -0,0 +1,183 @@
package MIME::Decoder::BinHex;
use strict;
use warnings;
=head1 NAME
MIME::Decoder::BinHex - decode a "binhex" stream
=head1 SYNOPSIS
A generic decoder object; see L<MIME::Decoder> for usage.
Also supports a preamble() method to recover text before
the binhexed portion of the stream.
=head1 DESCRIPTION
A MIME::Decoder subclass for a nonstandard encoding whereby
data are binhex-encoded. Common non-standard MIME encodings for this:
x-uu
x-uuencode
=head1 SEE ALSO
L<MIME::Decoder>
=head1 AUTHOR
Julian Field (F<mailscanner@ecs.soton.ac.uk>).
All rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
=cut
require 5.002;
use vars qw(@ISA $VERSION);
use MIME::Decoder;
use MIME::Tools qw(whine);
use Convert::BinHex;
@ISA = qw(MIME::Decoder);
# The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
#------------------------------
#
# decode_it IN, OUT
#
sub decode_it {
my ($self, $in, $out) = @_;
my ($mode, $file);
my (@preamble, @data);
my $H2B = Convert::BinHex->hex2bin;
my $line;
$self->{MDU_Preamble} = \@preamble;
$self->{MDU_Mode} = '600';
$self->{MDU_File} = undef;
### Find beginning...
local $_;
while (defined($_ = $in->getline)) {
if (/^\(This file must be converted/) {
$_ = $in->getline;
last if /^:/;
}
push @preamble, $_;
}
die("binhex decoding: fell off end of file\n") if !defined($_);
### Decode:
my $data;
$data = $H2B->next($_); # or whine("Next error is $@ $!\n");
my $len = unpack("C", $data);
while ($len > length($data)+21 && defined($line = $in->getline)) {
$data .= $H2B->next($line);
}
if (length($data) >= 22+$len) {
$data = substr($data, 22+$len);
} else {
$data = '';
}
$out->print($data);
while (defined($_ = $in->getline)) {
$line = $_;
$data = $H2B->next($line);
$out->print($data);
last if $line =~ /:$/;
}
1;
}
#------------------------------
#
# encode_it IN, OUT
#
sub encode_it {
my ($self, $in, $out) = @_;
my $line;
my $buf = '';
my $fname = (($self->head &&
$self->head->mime_attr('content-disposition.filename')) ||
'');
my $B2H = Convert::BinHex->bin2hex;
$out->print("(This file must be converted with BinHex 4.0)\n");
# Sigh... get length of file
$in->seek(0, 2);
my $datalen = $in->tell();
$in->seek(0, 0);
# Build header in core:
my @hdrs;
my $flen = length($fname);
push @hdrs, pack("C", $flen);
push @hdrs, pack("a$flen", $fname);
push @hdrs, pack('C', 4);
push @hdrs, pack('a4', '????');
push @hdrs, pack('a4', '????');
push @hdrs, pack('n', 0);
push @hdrs, pack('N', $datalen);
push @hdrs, pack('N', 0); # Resource length
my $hdr = join '', @hdrs;
# Compute the header CRC:
my $crc = Convert::BinHex::binhex_crc("\000\000",
Convert::BinHex::binhex_crc($hdr, 0));
# Output the header (plus its CRC):
$out->print($B2H->next($hdr . pack('n', $crc)));
while ($in->read($buf, 1000)) {
$out->print($B2H->next($buf));
}
$out->print($B2H->done);
1;
}
#------------------------------
#
# last_preamble
#
# Return the last preamble as ref to array of lines.
# Gets reset by decode_it().
#
sub last_preamble {
my $self = shift;
return $self->{MDU_Preamble} || [];
}
#------------------------------
#
# last_mode
#
# Return the last mode.
# Gets reset to undef by decode_it().
#
sub last_mode {
shift->{MDU_Mode};
}
#------------------------------
#
# last_filename
#
# Return the last filename.
# Gets reset by decode_it().
#
sub last_filename {
shift->{MDU_File} || undef; #[];
}
#------------------------------
1;

View File

@ -0,0 +1,86 @@
package MIME::Decoder::Binary;
use strict;
use warnings;
=head1 NAME
MIME::Decoder::Binary - perform no encoding/decoding
=head1 SYNOPSIS
A generic decoder object; see L<MIME::Decoder> for usage.
=head1 DESCRIPTION
A MIME::Decoder subclass for the C<"binary"> encoding (in other words,
no encoding).
The C<"binary"> decoder is a special case, since it's ill-advised
to read the input line-by-line: after all, an uncompressed image file might
conceivably have loooooooooong stretches of bytes without a C<"\n"> among
them, and we don't want to risk blowing out our core. So, we
read-and-write fixed-size chunks.
Both the B<encoder> and B<decoder> do a simple pass-through of the data
from input to output.
=head1 SEE ALSO
L<MIME::Decoder>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
All rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
=cut
use MIME::Decoder;
use vars qw(@ISA $VERSION);
@ISA = qw(MIME::Decoder);
### The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
### Buffer length:
my $BUFLEN = 8192;
#------------------------------
#
# decode_it IN, OUT
#
sub decode_it {
my ($self, $in, $out) = @_;
my ($buf, $nread) = ('', 0);
while ($nread = $in->read($buf, $BUFLEN)) {
$out->print($buf);
}
defined($nread) or return undef; ### check for error
1;
}
#------------------------------
#
# encode_it IN, OUT
#
sub encode_it {
my ($self, $in, $out) = @_;
my ($buf, $nread) = ('', 0);
while ($nread = $in->read($buf, $BUFLEN)) {
$out->print($buf);
}
defined($nread) or return undef; ### check for error
1;
}
#------------------------------
1;

View File

@ -0,0 +1,111 @@
package MIME::Decoder::Gzip64;
=head1 NAME
MIME::Decoder::Gzip64 - decode a "base64" gzip stream
=head1 SYNOPSIS
A generic decoder object; see L<MIME::Decoder> for usage.
=head1 DESCRIPTION
A MIME::Decoder::Base64 subclass for a nonstandard encoding whereby
data are gzipped, then the gzipped file is base64-encoded.
Common non-standard MIME encodings for this:
x-gzip64
Since this class relies on external programs which may not
exist on your machine, MIME-tools does not "install" it by default.
To use it, you need to say in your main program:
install MIME::Decoder::Gzip64 'x-gzip64';
Note: if this class isn't working for you, you may need to change the
commands it runs. In your main program, you can do so by setting up
the two commands which handle the compression/decompression.
use MIME::Decoder::Gzip64;
$MIME::Decoder::Gzip64::GZIP = 'gzip -c';
$MIME::Decoder::Gzip64::GUNZIP = 'gzip -d -c';
=head1 SEE ALSO
L<MIME::Decoder>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
All rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
=cut
require 5.002;
use strict;
use vars qw(@ISA $VERSION $GZIP $GUNZIP);
use MIME::Decoder;
use MIME::Base64;
use MIME::Decoder::Base64;
use MIME::Tools qw(tmpopen whine);
# Inheritance:
@ISA = qw(MIME::Decoder::Base64);
# The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
# How to compress stdin to stdout:
$GZIP = "gzip -c";
# How to UNcompress stdin to stdout:
$GUNZIP = "gzip -d -c";
#------------------------------
#
# decode_it IN, OUT
#
sub decode_it {
my ($self, $in, $out) = @_;
# Open a temp file (assume the worst, that this is a big stream):
my $tmp = tmpopen() || die "can't get temp file";
# Stage 1: decode the base64'd stream into zipped data:
$self->SUPER::decode_it($in, $tmp) or die "base64 decoding failed!";
# Stage 2: un-zip the zipped data:
$tmp->seek(0, 0);
$self->filter($tmp, $out, $GUNZIP) or die "gzip decoding failed!";
}
#------------------------------
#
# encode_it IN, OUT
#
sub encode_it {
my ($self, $in, $out) = @_;
whine "Encoding ", $self->encoding, " is not standard MIME!";
# Open a temp file (assume the worst, that this is a big stream):
my $tmp = tmpopen() || die "can't get temp file";
# Stage 1: zip the raw data:
$self->filter($in, $tmp, $GZIP) or die "gzip encoding failed!";
# Stage 2: encode the zipped data via base64:
$tmp->seek(0, 0);
$self->SUPER::encode_it($tmp, $out) or die "base64 encoding failed!";
}
#------------------------------
1;

View File

@ -0,0 +1,160 @@
package MIME::Decoder::NBit;
use strict;
use warnings;
=head1 NAME
MIME::Decoder::NBit - encode/decode a "7bit" or "8bit" stream
=head1 SYNOPSIS
A generic decoder object; see L<MIME::Decoder> for usage.
=head1 DESCRIPTION
This is a MIME::Decoder subclass for the C<7bit> and C<8bit> content
transfer encodings. These are not "encodings" per se: rather, they
are simply assertions of the content of the message.
From RFC-2045 Section 6.2.:
Three transformations are currently defined: identity, the "quoted-
printable" encoding, and the "base64" encoding. The domains are
"binary", "8bit" and "7bit".
The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all
mean that the identity (i.e. NO) encoding transformation has been
performed. As such, they serve simply as indicators of the domain of
the body data, and provide useful information about the sort of
encoding that might be needed for transmission in a given transport
system.
In keeping with this: as of MIME-tools 4.x,
I<this class does no modification of its input when encoding;>
all it does is attempt to I<detect violations> of the 7bit/8bit assertion,
and issue a warning (one per message) if any are found.
=head2 Legal 7bit data
RFC-2045 Section 2.7 defines legal C<7bit> data:
"7bit data" refers to data that is all represented as relatively
short lines with 998 octets or less between CRLF line separation
sequences [RFC-821]. No octets with decimal values greater than 127
are allowed and neither are NULs (octets with decimal value 0). CR
(decimal value 13) and LF (decimal value 10) octets only occur as
part of CRLF line separation sequences.
=head2 Legal 8bit data
RFC-2045 Section 2.8 defines legal C<8bit> data:
"8bit data" refers to data that is all represented as relatively
short lines with 998 octets or less between CRLF line separation
sequences [RFC-821]), but octets with decimal values greater than 127
may be used. As with "7bit data" CR and LF octets only occur as part
of CRLF line separation sequences and no NULs are allowed.
=head2 How decoding is done
The B<decoder> does a line-by-line pass-through from input to output,
leaving the data unchanged I<except> that an end-of-line sequence of
CRLF is converted to a newline "\n". Given the line-oriented nature
of 7bit and 8bit, this seems relatively sensible.
=head2 How encoding is done
The B<encoder> does a line-by-line pass-through from input to output,
and simply attempts to I<detect> violations of the C<7bit>/C<8bit>
domain. The default action is to warn once per encoding if violations
are detected; the warnings may be silenced with the QUIET configuration
of L<MIME::Tools>.
=head1 SEE ALSO
L<MIME::Decoder>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
All rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
=cut
use vars qw(@ISA $VERSION);
use MIME::Decoder;
use MIME::Tools qw(:msgs);
@ISA = qw(MIME::Decoder);
### The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
### How many bytes to decode at a time?
my $DecodeChunkLength = 8 * 1024;
#------------------------------
#
# decode_it IN, OUT
#
sub decode_it {
my ($self, $in, $out) = @_;
my $and_also;
### Allocate a buffer suitable for a chunk and a line:
local $_ = (' ' x ($DecodeChunkLength + 1024)); $_ = '';
### Get chunks until done:
while ($in->read($_, $DecodeChunkLength)) {
$and_also = $in->getline;
$_ .= $and_also if defined($and_also);
### Just got a chunk ending in a line.
s/\015\012$/\n/g;
$out->print($_);
}
1;
}
#------------------------------
#
# encode_it IN, OUT
#
sub encode_it {
my ($self, $in, $out) = @_;
my $saw_8bit = 0; ### warn them ONCE PER ENCODING if 8-bit data exists
my $saw_long = 0; ### warn them ONCE PER ENCODING if long lines exist
my $seven_bit = ($self->encoding eq '7bit'); ### 7bit?
my $line;
while (defined($line = $in->getline)) {
### Whine if encoding is 7bit and it has 8-bit data:
if ($seven_bit && ($line =~ /[\200-\377]/)) { ### oops! saw 8-bit data!
whine "saw 8-bit data while encoding 7bit" unless $saw_8bit++;
}
### Whine if long lines detected:
if (length($line) > 998) {
whine "saw long line while encoding 7bit/8bit" unless $saw_long++;
}
### Output!
$out->print($line);
}
1;
}
1;

View File

@ -0,0 +1,159 @@
package MIME::Decoder::QuotedPrint;
use strict;
use warnings;
=head1 NAME
MIME::Decoder::QuotedPrint - encode/decode a "quoted-printable" stream
=head1 SYNOPSIS
A generic decoder object; see L<MIME::Decoder> for usage.
=head1 DESCRIPTION
A MIME::Decoder subclass for the C<"quoted-printable"> encoding.
The name was chosen to jibe with the pre-existing MIME::QuotedPrint
utility package, which this class actually uses to translate each line.
=over 4
=item *
The B<decoder> does a line-by-line translation from input to output.
=item *
The B<encoder> does a line-by-line translation, breaking lines
so that they fall under the standard 76-character limit for this
encoding.
=back
B<Note:> just like MIME::QuotedPrint, we currently use the
native C<"\n"> for line breaks, and not C<CRLF>. This may
need to change in future versions.
=head1 SEE ALSO
L<MIME::Decoder>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
All rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
=cut
use vars qw(@ISA $VERSION);
use MIME::Decoder;
use MIME::QuotedPrint;
@ISA = qw(MIME::Decoder);
# The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
#------------------------------
# If we have MIME::QuotedPrint 3.03 or later, use the three-argument
# version. If we have an earlier version of MIME::QuotedPrint, we
# may get the wrong results. However, on some systems (RH Linux,
# for example), MIME::QuotedPrint is part of the Perl package and
# upgrading it separately breaks their magic auto-update tools.
# We are supporting older versions of MIME::QuotedPrint even though
# they may give incorrect results simply because it's too painful
# for many people to upgrade.
# The following code is horrible. I know. Beat me up. --dfs
BEGIN {
if (!defined(&encode_qp_threearg)) {
if ($::MIME::QuotedPrint::VERSION >= 3.03) {
eval 'sub encode_qp_threearg ( $$$ ) { encode_qp(shift, shift, shift); }';
} else {
eval 'sub encode_qp_threearg ( $$$ ) { encode_qp(shift); }';
}
}
}
#------------------------------
#
# encode_qp_really STRING TEXTUAL_TYPE_FLAG
#
# Encode QP, and then follow guideline 8 from RFC 2049 (thanks to Denis
# N. Antonioli) whereby we make things a little safer for the transport
# and storage of messages. WARNING: we can only do this if the line won't
# grow beyond 76 characters!
#
sub encode_qp_really {
my $enc = encode_qp_threearg(shift, undef, not shift);
if (length($enc) < 74) {
$enc =~ s/^\.\n/=2E\n/g; # force encoding of /^\.$/
$enc =~ s/^From /=46rom /g; # force encoding of /^From /
}
$enc;
}
#------------------------------
#
# decode_it IN, OUT
#
sub decode_it {
my ($self, $in, $out) = @_;
my $init = 0;
my $badpdf = 0;
local $_;
while (defined($_ = $in->getline)) {
#
# Dirty hack to fix QP-Encoded PDFs from MS-Outlook.
#
# Check if we have a PDF file and if it has been encoded
# on Windows. Unix encoded files are fine. If we have
# one encoded CR after the PDF init string but are missing
# an encoded CR before the newline this means the PDF is broken.
#
if (!$init) {
$init = 1;
if ($_ =~ /^%PDF-[0-9\.]+=0D/ && $_ !~ /=0D\n$/) {
$badpdf = 1;
}
}
#
# Decode everything with decode_qp() except corrupted PDFs.
#
if ($badpdf) {
my $output = $_;
$output =~ s/[ \t]+?(\r?\n)/$1/g;
$output =~ s/=\r?\n//g;
$output =~ s/(^|[^\r])\n\Z/$1\r\n/;
$output =~ s/=([\da-fA-F]{2})/pack("C", hex($1))/ge;
$out->print($output);
} else {
$out->print(decode_qp($_));
}
}
1;
}
#------------------------------
#
# encode_it IN, OUT
#
sub encode_it {
my ($self, $in, $out, $textual_type) = @_;
local $_;
while (defined($_ = $in->getline)) {
$out->print(encode_qp_really($_, $textual_type));
}
1;
}
#------------------------------
1;

View File

@ -0,0 +1,151 @@
package MIME::Decoder::UU;
use strict;
use warnings;
=head1 NAME
MIME::Decoder::UU - decode a "uuencoded" stream
=head1 SYNOPSIS
A generic decoder object; see L<MIME::Decoder> for usage.
Also supports a preamble() method to recover text before
the uuencoded portion of the stream.
=head1 DESCRIPTION
A MIME::Decoder subclass for a nonstandard encoding whereby
data are uuencoded. Common non-standard MIME encodings for this:
x-uu
x-uuencode
=head1 SEE ALSO
L<MIME::Decoder>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
UU-decoding code lifted from "uuexplode", a Perl script by an
unknown author...
All rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
=cut
require 5.002;
use vars qw(@ISA $VERSION);
use MIME::Decoder;
use MIME::Tools qw(whine);
@ISA = qw(MIME::Decoder);
# The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
#------------------------------
#
# decode_it IN, OUT
#
sub decode_it {
my ($self, $in, $out) = @_;
my ($mode, $file);
my @preamble;
### Init:
$self->{MDU_Preamble} = \@preamble;
$self->{MDU_Mode} = undef;
$self->{MDU_File} = undef;
### Find beginning...
local $_;
while (defined($_ = $in->getline)) {
if (/^begin(.*)/) { ### found it: now decode it...
my $modefile = $1;
if ($modefile =~ /^(\s+(\d+))?(\s+(.*?\S))?\s*\Z/) {
($mode, $file) = ($2, $4);
}
last; ### decoded or not, we're done
}
push @preamble, $_;
}
die("uu decoding: no begin found\n") if !defined($_); # hit eof!
### Store info:
$self->{MDU_Mode} = $mode;
$self->{MDU_File} = $file;
### Decode:
while (defined($_ = $in->getline)) {
last if /^end/;
next if /[a-z]/;
next unless int((((ord() - 32) & 077) + 2) / 3) == int(length() / 4);
$out->print(unpack('u', $_));
}
### chmod oct($mode), $file; # sheeyeah... right...
whine "file incomplete, no end found\n" if !defined($_); # eof
1;
}
#------------------------------
#
# encode_it IN, OUT
#
sub encode_it {
my ($self, $in, $out) = @_;
my $buf = '';
my $fname = (($self->head &&
$self->head->mime_attr('content-disposition.filename')) ||
'');
my $nl = $MIME::Entity::BOUNDARY_DELIMITER || "\n";
$out->print("begin 644 $fname$nl");
while ($in->read($buf, 45)) { $out->print(pack('u', $buf)) }
$out->print("end$nl");
1;
}
#------------------------------
#
# last_preamble
#
# Return the last preamble as ref to array of lines.
# Gets reset by decode_it().
#
sub last_preamble {
my $self = shift;
return $self->{MDU_Preamble} || [];
}
#------------------------------
#
# last_mode
#
# Return the last mode.
# Gets reset to undef by decode_it().
#
sub last_mode {
shift->{MDU_Mode};
}
#------------------------------
#
# last_filename
#
# Return the last filename.
# Gets reset by decode_it().
#
sub last_filename {
shift->{MDU_File} || [];
}
#------------------------------
1;

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,63 @@
package MIME::Field::ConTraEnc;
=head1 NAME
MIME::Field::ConTraEnc - a "Content-transfer-encoding" field
=head1 DESCRIPTION
A subclass of Mail::Field.
I<Don't use this class directly... its name may change in the future!>
Instead, ask Mail::Field for new instances based on the field name!
=head1 SYNOPSIS
use Mail::Field;
use MIME::Head;
# Create an instance from some text:
$field = Mail::Field->new('Content-transfer-encoding', '7bit');
# Get the encoding.
# Possible values: 'binary', '7bit', '8bit', 'quoted-printable',
# 'base64' and '' (unspecified). Note that there can't be a
# single default for this, since it depends on the content type!
$encoding = $field->encoding;
=head1 SEE ALSO
L<MIME::Field::ParamVal>, L<Mail::Field>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
Dianne Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com
=cut
require 5.001;
use strict;
use MIME::Field::ParamVal;
use vars qw($VERSION @ISA);
@ISA = qw(MIME::Field::ParamVal);
# The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
# Install it:
bless([])->register('Content-transfer-encoding');
#------------------------------
sub encoding {
shift->paramstr('_', @_);
}
#------------------------------
1;

View File

@ -0,0 +1,68 @@
package MIME::Field::ContDisp;
=head1 NAME
MIME::Field::ContDisp - a "Content-disposition" field
=head1 DESCRIPTION
A subclass of Mail::Field.
I<Don't use this class directly... its name may change in the future!>
Instead, ask Mail::Field for new instances based on the field name!
=head1 SYNOPSIS
use Mail::Field;
use MIME::Head;
# Create an instance from some text:
$field = Mail::Field->new('Content-disposition', $text);
# Inline or attachment?
$type = $field->type;
# Recommended filename?
$filename = $field->filename;
=head1 SEE ALSO
L<MIME::Field::ParamVal>, L<Mail::Field>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
Dianne Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com
=cut
require 5.001;
use strict;
use MIME::Field::ParamVal;
use vars qw($VERSION @ISA);
@ISA = qw(MIME::Field::ParamVal);
# The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
# Install it:
bless([])->register('Content-disposition');
#------------------------------
sub filename {
shift->paramstr('filename', @_);
}
sub type {
shift->paramstr('_', @_);
}
#------------------------------
1;

View File

@ -0,0 +1,196 @@
package MIME::Field::ContType;
=head1 NAME
MIME::Field::ContType - a "Content-type" field
=head1 DESCRIPTION
A subclass of Mail::Field.
I<Don't use this class directly... its name may change in the future!>
Instead, ask Mail::Field for new instances based on the field name!
=head1 SYNOPSIS
use Mail::Field;
use MIME::Head;
# Create an instance from some text:
$field = Mail::Field->new('Content-type',
'text/HTML; charset="US-ASCII"');
# Get the MIME type, like 'text/plain' or 'x-foobar'.
# Returns 'text/plain' as default, as per RFC 2045:
my ($type, $subtype) = split('/', $field->type);
# Get generic information:
print $field->name;
# Get information related to "message" type:
if ($type eq 'message') {
print $field->id;
print $field->number;
print $field->total;
}
# Get information related to "multipart" type:
if ($type eq 'multipart') {
print $field->boundary; # the basic value, fixed up
print $field->multipart_boundary; # empty if not a multipart message!
}
# Get information related to "text" type:
if ($type eq 'text') {
print $field->charset; # returns 'us-ascii' as default
}
=head1 PUBLIC INTERFACE
=over 4
=cut
require 5.001;
use strict;
use MIME::Field::ParamVal;
use vars qw($VERSION @ISA);
@ISA = qw(MIME::Field::ParamVal);
# The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
# Install it:
bless([])->register('Content-type');
#------------------------------
#
# Basic access/storage methods...
#
sub charset {
lc(shift->paramstr('charset', @_)) || 'us-ascii'; # RFC 2045
}
sub id {
shift->paramstr('id', @_);
}
sub name {
shift->paramstr('name', @_);
}
sub number {
shift->paramstr('number', @_);
}
sub total {
shift->paramstr('total', @_);
}
#------------------------------
=item boundary
Return the boundary field. The boundary is returned exactly
as given in the C<Content-type:> field; that is, the leading
double-hyphen (C<-->) is I<not> prepended.
(Well, I<almost> exactly... from RFC 2046:
(If a boundary appears to end with white space, the white space
must be presumed to have been added by a gateway, and must be deleted.)
so we oblige and remove any trailing spaces.)
Returns the empty string if there is no boundary, or if the boundary is
illegal (e.g., if it is empty after all trailing whitespace has been
removed).
=cut
sub boundary {
my $value = shift->param('boundary', @_);
defined($value) || return '';
$value =~ s/\s+$//; # kill trailing white, per RFC 2046
$value;
}
#------------------------------
=item multipart_boundary
Like C<boundary()>, except that this will also return the empty
string if the message is not a multipart message. In other words,
there's an automatic sanity check.
=cut
sub multipart_boundary {
my $self = shift;
my ($type) = split('/', $self->type);
return '' if ($type ne 'multipart'); # not multipart!
$self->boundary; # okay, return the boundary
}
#------------------------------
=item type
Try real hard to determine the content type (e.g., C<"text/plain">,
C<"image/gif">, C<"x-weird-type">, which is returned
in all-lowercase.
A happy thing: the following code will work just as you would want,
even if there's no subtype (as in C<"x-weird-type">)... in such a case,
the $subtype would simply be the empty string:
($type, $subtype) = split('/', $head->mime_type);
If the content-type information is missing, it defaults to C<"text/plain">,
as per RFC 2045:
Default RFC 2822 messages are typed by this protocol as plain text in
the US-ASCII character set, which can be explicitly specified as
"Content-type: text/plain; charset=us-ascii". If no Content-Type is
specified, this default is assumed.
B<Note:> under the "be liberal in what we accept" principle, this routine
no longer syntax-checks the content type. If it ain't empty,
just downcase and return it.
=cut
sub type {
lc(shift->paramstr('_', @_)) || 'text/plain'; # RFC 2045
}
#------------------------------
=back
=head1 NOTES
Since nearly all (if not all) parameters must have non-empty values
to be considered valid, we just return the empty string to signify
missing fields. If you need to get the I<real> underlying value,
use the inherited C<param()> method (which returns undef if the
parameter is missing).
=head1 SEE ALSO
L<MIME::Field::ParamVal>, L<Mail::Field>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
Dianne Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com
=cut
1;

View File

@ -0,0 +1,416 @@
package MIME::Field::ParamVal;
use MIME::Words;
=head1 NAME
MIME::Field::ParamVal - subclass of Mail::Field, for structured MIME fields
=head1 SYNOPSIS
# Create an object for a content-type field:
$field = new Mail::Field 'Content-type';
# Set some attributes:
$field->param('_' => 'text/html');
$field->param('charset' => 'us-ascii');
$field->param('boundary' => '---ABC---');
# Same:
$field->set('_' => 'text/html',
'charset' => 'us-ascii',
'boundary' => '---ABC---');
# Get an attribute, or undefined if not present:
print "no id!" if defined($field->param('id'));
# Same, but use empty string for missing values:
print "no id!" if ($field->paramstr('id') eq '');
# Output as string:
print $field->stringify, "\n";
=head1 DESCRIPTION
This is an abstract superclass of most MIME fields. It handles
fields with a general syntax like this:
Content-Type: Message/Partial;
number=2; total=3;
id="oc=jpbe0M2Yt4s@thumper.bellcore.com"
Comments are supported I<between> items, like this:
Content-Type: Message/Partial; (a comment)
number=2 (another comment) ; (yet another comment) total=3;
id="oc=jpbe0M2Yt4s@thumper.bellcore.com"
=head1 PUBLIC INTERFACE
=over 4
=cut
#------------------------------
require 5.001;
# Pragmas:
use strict;
use re 'taint';
use vars qw($VERSION @ISA);
# Other modules:
use Mail::Field;
# Kit modules:
use MIME::Tools qw(:config :msgs);
@ISA = qw(Mail::Field);
#------------------------------
#
# Public globals...
#
#------------------------------
# The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
#------------------------------
#
# Private globals...
#
#------------------------------
# Pattern to match parameter names (like fieldnames, but = not allowed):
my $PARAMNAME = '[^\x00-\x1f\x80-\xff :=]+';
# Pattern to match the first value on the line:
my $FIRST = '[^\s\;\x00-\x1f\x80-\xff]*';
# Pattern to match an RFC 2045 token:
#
# token = 1*<any (ASCII) CHAR except SPACE, CTLs, or tspecials>
#
my $TSPECIAL = '()<>@,;:\</[]?="';
#" Fix emacs highlighting...
my $TOKEN = '[^ \x00-\x1f\x80-\xff' . "\Q$TSPECIAL\E" . ']+';
my $QUOTED_STRING = '"([^\\\\"]*(?:\\\\.(?:[^\\\\"]*))*)"';
# Encoded token:
my $ENCTOKEN = "=\\?[^?]*\\?[A-Za-z]\\?[^?]+\\?=";
# Pattern to match spaces or comments:
my $SPCZ = '(?:\s|\([^\)]*\))*';
# Pattern to match non-semicolon as fallback for broken MIME
# produced by some viruses
my $BADTOKEN = '[^;]+';
#------------------------------
#
# Class init...
#
#------------------------------
#------------------------------
=item set [\%PARAMHASH | KEY=>VAL,...,KEY=>VAL]
I<Instance method.> Set this field.
The paramhash should contain parameter names
in I<all lowercase>, with the special C<"_"> parameter name
signifying the "default" (unnamed) parameter for the field:
# Set up to be...
#
# Content-type: Message/Partial; number=2; total=3; id="ocj=pbe0M2"
#
$conttype->set('_' => 'Message/Partial',
'number' => 2,
'total' => 3,
'id' => "ocj=pbe0M2");
Note that a single argument is taken to be a I<reference> to
a paramhash, while multiple args are taken to be the elements
of the paramhash themselves.
Supplying undef for a hashref, or an empty set of values, effectively
clears the object.
The self object is returned.
=cut
sub set {
my $self = shift;
my $params = ((@_ == 1) ? (shift || {}) : {@_});
%$self = %$params; # set 'em
$self;
}
#------------------------------
=item parse_params STRING
I<Class/instance utility method.>
Extract parameter info from a structured field, and return
it as a hash reference. For example, here is a field with parameters:
Content-Type: Message/Partial;
number=2; total=3;
id="oc=jpbe0M2Yt4s@thumper.bellcore.com"
Here is how you'd extract them:
$params = $class->parse_params('content-type');
if ($$params{'_'} eq 'message/partial') {
$number = $$params{'number'};
$total = $$params{'total'};
$id = $$params{'id'};
}
Like field names, parameter names are coerced to lowercase.
The special '_' parameter means the default parameter for the
field.
B<NOTE:> This has been provided as a public method to support backwards
compatibility, but you probably shouldn't use it.
=cut
sub rfc2231decode {
my($val) = @_;
my($enc, $lang, $rest);
local($1,$2,$3);
if ($val =~ m/^([^']*)'([^']*)'(.*)\z/s) {
$enc = $1;
$lang = $2;
$rest = $3;
} elsif ($val =~ m/^([^']*)'([^']*)\z/s) {
$enc = $1;
$rest = $2;
} else {
$rest = $val;
# $enc remains undefined when charset/language info is missing
}
return ($enc, $lang, $rest);
}
sub rfc2231percent {
# Do percent-substitution
my($str) = @_;
local $1;
$str =~ s/%([0-9a-fA-F]{2})/pack("C", hex($1))/ge;
return $str;
}
sub parse_params {
my ($self, $raw) = @_;
my %params;
my %rfc2231params;
my %rfc2231encoding_is_used;
my $param;
my $val;
my $part;
# Get raw field, and unfold it:
defined($raw) or $raw = '';
$raw =~ s/\n//g;
$raw =~ s/\s+\z//; # Strip trailing whitespace
local($1,$2,$3,$4,$5);
# Extract special first parameter:
$raw =~ m/\A$SPCZ($FIRST)$SPCZ/og or return {}; # nada!
$params{'_'} = $1;
# Extract subsequent parameters.
# No, we can't just "split" on semicolons: they're legal in quoted strings!
while (1) { # keep chopping away until done...
$raw =~ m/\G[^;]*(\;$SPCZ)+/og or last; # skip leading separator
$raw =~ m/\G($PARAMNAME)\s*=\s*/og or last; # give up if not a param
$param = lc($1);
$raw =~ m/\G(?:$QUOTED_STRING|($ENCTOKEN)|($TOKEN)|($BADTOKEN))/g or last; # give up if no value"
my ($qstr, $enctoken, $token, $badtoken) = ($1, $2, $3, $4, $5);
if (defined($qstr)) {
# unescape
$qstr =~ s/\\(.)/$1/g;
}
if (defined($badtoken)) {
# Strip leading/trailing whitespace from badtoken
$badtoken =~ s/^\s+//;
$badtoken =~ s/\s+\z//;
# Only keep token parameters in badtoken;
# cut it off at the first non-token char. CPAN RT #105455
$badtoken =~ /^($TOKEN)*/;
$badtoken = $1;
# Cut it off at first whitespace too
$badtoken =~ s/\s.*//;
}
$val = defined($qstr) ? $qstr :
(defined($enctoken) ? $enctoken :
(defined($badtoken) ? $badtoken : $token));
# Do RFC 2231 processing
# Pick out the parts of the parameter
if ($param =~ /\*/ &&
$param =~ /^ ([^*]+) (?: \* ([^*]+) )? (\*)? \z/xs) {
# We have param*number* or param*number or param*
my($name, $num) = ($1, $2||0);
if (defined($3)) {
# We have param*number* or param*
# RFC 2231: Asterisks ("*") are reused to provide the
# indicator that language and character set information
# is present and encoding is being used
$val = rfc2231percent($val);
$rfc2231encoding_is_used{$name} = 1;
}
$rfc2231params{$name}{$num} .= $val;
} else {
# Assign non-rfc2231 value directly. If we
# did get a mix of rfc2231 and non-rfc2231 values,
# the non-rfc2231 will be blown away in the
# "extract reconstructed parameters" loop.
$params{$param} = $val;
}
}
# Extract reconstructed parameters
foreach $param (keys %rfc2231params) {
# If we got any rfc-2231 parameters, then
# blow away any potential non-rfc-2231 parameter.
$params{$param} = '';
foreach $part (sort { $a <=> $b } keys %{$rfc2231params{$param}}) {
$params{$param} .= $rfc2231params{$param}{$part};
}
if ($rfc2231encoding_is_used{$param}) {
my($enc, $lang, $val) = rfc2231decode($params{$param});
if (defined $enc) {
# re-encode as QP, preserving charset and language info
$val =~ s{([=?_\x00-\x1F\x7F-\xFF])}
{sprintf("=%02X", ord($1))}eg;
$val =~ tr/ /_/;
# RFC 2231 section 5: Language specification in Encoded Words
$enc .= '*' . $lang if defined $lang && $lang ne '';
$params{$param} = '=?' . $enc . '?Q?' . $val . '?=';
}
}
debug " field param <$param> = <$params{$param}>";
}
# Done:
\%params;
}
#------------------------------
=item parse STRING
I<Class/instance method.>
Parse the string into the instance. Any previous information is wiped.
The self object is returned.
May also be used as a constructor.
=cut
sub parse {
my ($self, $string) = @_;
# Allow use as constructor, for MIME::Head:
ref($self) or $self = bless({}, $self);
# Get params, and stuff them into the self object:
$self->set($self->parse_params($string));
}
#------------------------------
=item param PARAMNAME,[VALUE]
I<Instance method.>
Return the given parameter, or undef if it isn't there.
With argument, set the parameter to that VALUE.
The PARAMNAME is case-insensitive. A "_" refers to the "default" parameter.
=cut
sub param {
my ($self, $paramname, $value) = @_;
$paramname = lc($paramname);
$self->{$paramname} = $value if (@_ > 2);
$self->{$paramname}
}
#------------------------------
=item paramstr PARAMNAME,[VALUE]
I<Instance method.>
Like param(): return the given parameter, or I<empty> if it isn't there.
With argument, set the parameter to that VALUE.
The PARAMNAME is case-insensitive. A "_" refers to the "default" parameter.
=cut
sub paramstr {
my $val = shift->param(@_);
(defined($val) ? $val : '');
}
#------------------------------
=item stringify
I<Instance method.>
Convert the field to a string, and return it.
=cut
sub stringify {
my $self = shift;
my ($key, $val);
my $str = $self->{'_'}; # default subfield
foreach $key (sort keys %$self) {
next if ($key !~ /^[a-z][a-z-_0-9]*$/); # only lowercase ones!
defined($val = $self->{$key}) or next;
$val =~ s/(["\\])/\\$1/g;
$str .= qq{; $key="$val"};
}
$str;
}
#------------------------------
=item tag
I<Instance method, abstract.>
Return the tag for this field.
=cut
sub tag { '' }
=back
=head1 SEE ALSO
L<Mail::Field>
=cut
#------------------------------
1;

View File

@ -0,0 +1,930 @@
package MIME::Head;
use MIME::WordDecoder;
=head1 NAME
MIME::Head - MIME message header (a subclass of Mail::Header)
=head1 SYNOPSIS
Before reading further, you should see L<MIME::Tools> to make sure that
you understand where this module fits into the grand scheme of things.
Go on, do it now. I'll wait.
Ready? Ok...
=head2 Construction
### Create a new, empty header, and populate it manually:
$head = MIME::Head->new;
$head->replace('content-type', 'text/plain; charset=US-ASCII');
$head->replace('content-length', $len);
### Parse a new header from a filehandle:
$head = MIME::Head->read(\*STDIN);
### Parse a new header from a file, or a readable pipe:
$testhead = MIME::Head->from_file("/tmp/test.hdr");
$a_b_head = MIME::Head->from_file("cat a.hdr b.hdr |");
=head2 Output
### Output to filehandle:
$head->print(\*STDOUT);
### Output as string:
print STDOUT $head->as_string;
print STDOUT $head->stringify;
=head2 Getting field contents
### Is this a reply?
$is_reply = 1 if ($head->get('Subject') =~ /^Re: /);
### Get receipt information:
print "Last received from: ", $head->get('Received', 0);
@all_received = $head->get('Received');
### Print the subject, or the empty string if none:
print "Subject: ", $head->get('Subject',0);
### Too many hops? Count 'em and see!
if ($head->count('Received') > 5) { ...
### Test whether a given field exists
warn "missing subject!" if (! $head->count('subject'));
=head2 Setting field contents
### Declare this to be an HTML header:
$head->replace('Content-type', 'text/html');
=head2 Manipulating field contents
### Get rid of internal newlines in fields:
$head->unfold;
### Decode any Q- or B-encoded-text in fields (DEPRECATED):
$head->decode;
=head2 Getting high-level MIME information
### Get/set a given MIME attribute:
unless ($charset = $head->mime_attr('content-type.charset')) {
$head->mime_attr("content-type.charset" => "US-ASCII");
}
### The content type (e.g., "text/html"):
$mime_type = $head->mime_type;
### The content transfer encoding (e.g., "quoted-printable"):
$mime_encoding = $head->mime_encoding;
### The recommended name when extracted:
$file_name = $head->recommended_filename;
### The boundary text, for multipart messages:
$boundary = $head->multipart_boundary;
=head1 DESCRIPTION
A class for parsing in and manipulating RFC-822 message headers, with
some methods geared towards standard (and not so standard) MIME fields
as specified in the various I<Multipurpose Internet Mail Extensions>
RFCs (starting with RFC 2045)
=head1 PUBLIC INTERFACE
=cut
#------------------------------
require 5.002;
### Pragmas:
use strict;
use vars qw($VERSION @ISA @EXPORT_OK);
### System modules:
use IO::File;
### Other modules:
use Mail::Header 1.09 ();
use Mail::Field 1.05 ();
### Kit modules:
use MIME::Words qw(:all);
use MIME::Tools qw(:config :msgs);
use MIME::Field::ParamVal;
use MIME::Field::ConTraEnc;
use MIME::Field::ContDisp;
use MIME::Field::ContType;
@ISA = qw(Mail::Header);
#------------------------------
#
# Public globals...
#
#------------------------------
### The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
### Sanity (we put this test after our own version, for CPAN::):
use Mail::Header 1.06 ();
#------------------------------
=head2 Creation, input, and output
=over 4
=cut
#------------------------------
#------------------------------
=item new [ARG],[OPTIONS]
I<Class method, inherited.>
Creates a new header object. Arguments are the same as those in the
superclass.
=cut
sub new {
my $class = shift;
bless Mail::Header->new(@_), $class;
}
#------------------------------
=item from_file EXPR,OPTIONS
I<Class or instance method>.
For convenience, you can use this to parse a header object in from EXPR,
which may actually be any expression that can be sent to open() so as to
return a readable filehandle. The "file" will be opened, read, and then
closed:
### Create a new header by parsing in a file:
my $head = MIME::Head->from_file("/tmp/test.hdr");
Since this method can function as either a class constructor I<or>
an instance initializer, the above is exactly equivalent to:
### Create a new header by parsing in a file:
my $head = MIME::Head->new->from_file("/tmp/test.hdr");
On success, the object will be returned; on failure, the undefined value.
The OPTIONS are the same as in new(), and are passed into new()
if this is invoked as a class method.
B<Note:> This is really just a convenience front-end onto C<read()>,
provided mostly for backwards-compatibility with MIME-parser 1.0.
=cut
sub from_file {
my ($self, $file, @opts) = @_; ### at this point, $self is inst. or class!
my $class = ref($self) ? ref($self) : $self;
### Parse:
my $fh = IO::File->new($file, '<') or return error("open $file: $!");
$fh->binmode() or return error("binmode $file: $!"); # we expect to have \r\n at line ends, and want to keep 'em.
$self = $class->new($fh, @opts); ### now, $self is instance or undef
$fh->close or return error("close $file: $!");
$self;
}
#------------------------------
=item read FILEHANDLE
I<Instance (or class) method.>
This initializes a header object by reading it in from a FILEHANDLE,
until the terminating blank line is encountered.
A syntax error or end-of-stream will also halt processing.
Supply this routine with a reference to a filehandle glob; e.g., C<\*STDIN>:
### Create a new header by parsing in STDIN:
$head->read(\*STDIN);
On success, the self object will be returned; on failure, a false value.
B<Note:> in the MIME world, it is perfectly legal for a header to be
empty, consisting of nothing but the terminating blank line. Thus,
we can't just use the formula that "no tags equals error".
B<Warning:> as of the time of this writing, Mail::Header::read did not flag
either syntax errors or unexpected end-of-file conditions (an EOF
before the terminating blank line). MIME::ParserBase takes this
into account.
=cut
sub read {
my $self = shift; ### either instance or class!
ref($self) or $self = $self->new; ### if used as class method, make new
$self->SUPER::read(@_);
}
#------------------------------
=back
=head2 Getting/setting fields
The following are methods related to retrieving and modifying the header
fields. Some are inherited from Mail::Header, but I've kept the
documentation around for convenience.
=over 4
=cut
#------------------------------
#------------------------------
=item add TAG,TEXT,[INDEX]
I<Instance method, inherited.>
Add a new occurrence of the field named TAG, given by TEXT:
### Add the trace information:
$head->add('Received',
'from eryq.pr.mcs.net by gonzo.net with smtp');
Normally, the new occurrence will be I<appended> to the existing
occurrences. However, if the optional INDEX argument is 0, then the
new occurrence will be I<prepended>. If you want to be I<explicit>
about appending, specify an INDEX of -1.
B<Warning>: this method always adds new occurrences; it doesn't overwrite
any existing occurrences... so if you just want to I<change> the value
of a field (creating it if necessary), then you probably B<don't> want to use
this method: consider using C<replace()> instead.
=cut
### Inherited.
#------------------------------
#
# copy
#
# Instance method, DEPRECATED.
# Duplicate the object.
#
sub copy {
usage "deprecated: use dup() instead.";
shift->dup(@_);
}
#------------------------------
=item count TAG
I<Instance method, inherited.>
Returns the number of occurrences of a field; in a boolean context, this
tells you whether a given field exists:
### Was a "Subject:" field given?
$subject_was_given = $head->count('subject');
The TAG is treated in a case-insensitive manner.
This method returns some false value if the field doesn't exist,
and some true value if it does.
=cut
### Inherited.
#------------------------------
=item decode [FORCE]
I<Instance method, DEPRECATED.>
Go through all the header fields, looking for RFC 1522 / RFC 2047 style
"Q" (quoted-printable, sort of) or "B" (base64) encoding, and decode
them in-place. Fellow Americans, you probably don't know what the hell
I'm talking about. Europeans, Russians, et al, you probably do.
C<:-)>.
B<This method has been deprecated.>
See L<MIME::Parser/decode_headers> for the full reasons.
If you absolutely must use it and don't like the warning, then
provide a FORCE:
"I_NEED_TO_FIX_THIS"
Just shut up and do it. Not recommended.
Provided only for those who need to keep old scripts functioning.
"I_KNOW_WHAT_I_AM_DOING"
Just shut up and do it. Not recommended.
Provided for those who REALLY know what they are doing.
B<What this method does.>
For an example, let's consider a valid email header you might get:
From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
=?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
=?US-ASCII?Q?.._cool!?=
That basically decodes to (sorry, I can only approximate the
Latin characters with 7 bit sequences /o and 'e):
From: Keith Moore <moore@cs.utk.edu>
To: Keld J/orn Simonsen <keld@dkuug.dk>
CC: Andr'e Pirard <PIRARD@vm1.ulg.ac.be>
Subject: If you can read this you understand the example... cool!
B<Note:> currently, the decodings are done without regard to the
character set: thus, the Q-encoding C<=F8> is simply translated to the
octet (hexadecimal C<F8>), period. For piece-by-piece decoding
of a given field, you want the array context of
C<MIME::Words::decode_mimewords()>.
B<Warning:> the CRLF+SPACE separator that splits up long encoded words
into shorter sequences (see the Subject: example above) gets lost
when the field is unfolded, and so decoding after unfolding causes
a spurious space to be left in the field.
I<THEREFORE: if you're going to decode, do so BEFORE unfolding!>
This method returns the self object.
I<Thanks to Kent Boortz for providing the idea, and the baseline
RFC-1522-decoding code.>
=cut
sub decode {
my $self = shift;
### Warn if necessary:
my $force = shift || 0;
unless (($force eq "I_NEED_TO_FIX_THIS") ||
($force eq "I_KNOW_WHAT_I_AM_DOING")) {
usage "decode is deprecated for safety";
}
my ($tag, $i, @decoded);
foreach $tag ($self->tags) {
@decoded = map { scalar(decode_mimewords($_, Field=>$tag))
} $self->get_all($tag);
for ($i = 0; $i < @decoded; $i++) {
$self->replace($tag, $decoded[$i], $i);
}
}
$self->{MH_Decoded} = 1;
$self;
}
#------------------------------
=item delete TAG,[INDEX]
I<Instance method, inherited.>
Delete all occurrences of the field named TAG.
### Remove some MIME information:
$head->delete('MIME-Version');
$head->delete('Content-type');
=cut
### Inherited
#------------------------------
#
# exists
#
sub exists {
usage "deprecated; use count() instead";
shift->count(@_);
}
#------------------------------
#
# fields
#
sub fields {
usage "deprecated: use tags() instead",
shift->tags(@_);
}
#------------------------------
=item get TAG,[INDEX]
I<Instance method, inherited.>
Get the contents of field TAG.
If a B<numeric INDEX> is given, returns the occurrence at that index,
or undef if not present:
### Print the first and last 'Received:' entries (explicitly):
print "First, or most recent: ", $head->get('received', 0);
print "Last, or least recent: ", $head->get('received',-1);
If B<no INDEX> is given, but invoked in a B<scalar> context, then
INDEX simply defaults to 0:
### Get the first 'Received:' entry (implicitly):
my $most_recent = $head->get('received');
If B<no INDEX> is given, and invoked in an B<array> context, then
I<all> occurrences of the field are returned:
### Get all 'Received:' entries:
my @all_received = $head->get('received');
B<NOTE>: The header(s) returned may end with a newline. If you don't
want this, then B<chomp> the return value.
=cut
### Inherited.
#------------------------------
=item get_all FIELD
I<Instance method.>
Returns the list of I<all> occurrences of the field, or the
empty list if the field is not present:
### How did it get here?
@history = $head->get_all('Received');
B<Note:> I had originally experimented with having C<get()> return all
occurrences when invoked in an array context... but that causes a lot of
accidents when you get careless and do stuff like this:
print "\u$field: ", $head->get($field);
It also made the intuitive behaviour unclear if the INDEX argument
was given in an array context. So I opted for an explicit approach
to asking for all occurrences.
=cut
sub get_all {
my ($self, $tag) = @_;
$self->count($tag) or return (); ### empty if doesn't exist
($self->get($tag));
}
#------------------------------
#
# original_text
#
# Instance method, DEPRECATED.
# Return an approximation of the original text.
#
sub original_text {
usage "deprecated: use stringify() instead";
shift->stringify(@_);
}
#------------------------------
=item print [OUTSTREAM]
I<Instance method, override.>
Print the header out to the given OUTSTREAM, or the currently-selected
filehandle if none. The OUTSTREAM may be a filehandle, or any object
that responds to a print() message.
The override actually lets you print to any object that responds to
a print() method. This is vital for outputting MIME entities to scalars.
Also, it defaults to the I<currently-selected> filehandle if none is given
(not STDOUT!), so I<please> supply a filehandle to prevent confusion.
=cut
sub print {
my ($self, $fh) = @_;
$fh ||= select;
$fh->print($self->as_string);
}
#------------------------------
#
# set TAG,TEXT
#
# Instance method, DEPRECATED.
# Set the field named TAG to [the single occurrence given by the TEXT.
#
sub set {
my $self = shift;
usage "deprecated: use the replace() method instead.";
$self->replace(@_);
}
#------------------------------
=item stringify
I<Instance method.>
Return the header as a string. You can also invoke it as C<as_string>.
If you set the variable $MIME::Entity::BOUNDARY_DELIMITER to a string,
that string will be used as line-end delimiter. If it is not set,
the line ending will be a newline character (\n)
=cut
sub stringify {
my $self = shift; ### build clean header, and output...
my @header = grep {defined($_) ? $_ : ()} @{$self->header};
my $header_delimiter = $MIME::Entity::BOUNDARY_DELIMITER || "\n";
join "", map { /\n$/ ? substr($_, 0, -1) . $header_delimiter : $_ . $header_delimiter } @header;
}
sub as_string { shift->stringify(@_) }
#------------------------------
=item unfold [FIELD]
I<Instance method, inherited.>
Unfold (remove newlines in) the text of all occurrences of the given FIELD.
If the FIELD is omitted, I<all> fields are unfolded.
Returns the "self" object.
=cut
### Inherited
#------------------------------
=back
=head2 MIME-specific methods
All of the following methods extract information from the following fields:
Content-type
Content-transfer-encoding
Content-disposition
Be aware that they do not just return the raw contents of those fields,
and in some cases they will fill in sensible (I hope) default values.
Use C<get()> or C<mime_attr()> if you need to grab and process the
raw field text.
B<Note:> some of these methods are provided both as a convenience and
for backwards-compatibility only, while others (like
recommended_filename()) I<really do have to be in MIME::Head to work
properly,> since they look for their value in more than one field.
However, if you know that a value is restricted to a single
field, you should really use the Mail::Field interface to get it.
=over 4
=cut
#------------------------------
#------------------------------
#
# params TAG
#
# Instance method, DEPRECATED.
# Extract parameter info from a structured field, and return
# it as a hash reference. Provided for 1.0 compatibility only!
# Use the new MIME::Field interface classes (subclasses of Mail::Field).
sub params {
my ($self, $tag) = @_;
usage "deprecated: use the MIME::Field interface classes from now on!";
return MIME::Field::ParamVal->parse_params($self->get($tag,0));
}
#------------------------------
=item mime_attr ATTR,[VALUE]
A quick-and-easy interface to set/get the attributes in structured
MIME fields:
$head->mime_attr("content-type" => "text/html");
$head->mime_attr("content-type.charset" => "US-ASCII");
$head->mime_attr("content-type.name" => "homepage.html");
This would cause the final output to look something like this:
Content-type: text/html; charset=US-ASCII; name="homepage.html"
Note that the special empty sub-field tag indicates the anonymous
first sub-field.
B<Giving VALUE as undefined> will cause the contents of the named subfield
to be deleted:
$head->mime_attr("content-type.charset" => undef);
B<Supplying no VALUE argument> just returns the attribute's value,
or undefined if it isn't there:
$type = $head->mime_attr("content-type"); ### text/html
$name = $head->mime_attr("content-type.name"); ### homepage.html
In all cases, the new/current value is returned.
=cut
sub mime_attr {
my ($self, $attr, $value) = @_;
### Break attribute name up:
my ($tag, $subtag) = split /\./, $attr;
$subtag ||= '_';
### Set or get?
my $field = MIME::Field::ParamVal->parse($self->get($tag, 0));
if (@_ > 2) { ### set it:
$field->param($subtag, $value); ### set subfield
$self->replace($tag, $field->stringify); ### replace!
return $value;
}
else { ### get it:
return $field->param($subtag);
}
}
#------------------------------
=item mime_encoding
I<Instance method.>
Try I<real hard> to determine the content transfer encoding
(e.g., C<"base64">, C<"binary">), which is returned in all-lowercase.
If no encoding could be found, the default of C<"7bit"> is returned
I quote from RFC 2045 section 6.1:
This is the default value -- that is, "Content-Transfer-Encoding: 7BIT"
is assumed if the Content-Transfer-Encoding header field is not present.
I do one other form of fixup: "7_bit", "7-bit", and "7 bit" are
corrected to "7bit"; likewise for "8bit".
=cut
sub mime_encoding {
my $self = shift;
my $enc = lc($self->mime_attr('content-transfer-encoding') || '7bit');
$enc =~ s{^([78])[ _-]bit\Z}{$1bit};
$enc;
}
#------------------------------
=item mime_type [DEFAULT]
I<Instance method.>
Try C<real hard> to determine the content type (e.g., C<"text/plain">,
C<"image/gif">, C<"x-weird-type">, which is returned in all-lowercase.
"Real hard" means that if no content type could be found, the default
(usually C<"text/plain">) is returned. From RFC 2045 section 5.2:
Default RFC 822 messages without a MIME Content-Type header are
taken by this protocol to be plain text in the US-ASCII character
set, which can be explicitly specified as:
Content-type: text/plain; charset=us-ascii
This default is assumed if no Content-Type header field is specified.
Unless this is a part of a "multipart/digest", in which case
"message/rfc822" is the default. Note that you can also I<set> the
default, but you shouldn't: normally only the MIME parser uses this
feature.
=cut
sub mime_type {
my ($self, $default) = @_;
$self->{MIH_DefaultType} = $default if @_ > 1;
my $s = $self->mime_attr('content-type') ||
$self->{MIH_DefaultType} ||
'text/plain';
# avoid [perl #87336] bug, lc laundering tainted data
return lc($s) if $] <= 5.008 || $] >= 5.014;
$s =~ tr/A-Z/a-z/;
$s;
}
#------------------------------
=item multipart_boundary
I<Instance method.>
If this is a header for a multipart message, return the
"encapsulation boundary" used to separate the parts. The boundary
is returned exactly as given in the C<Content-type:> field; that
is, the leading double-hyphen (C<-->) is I<not> prepended.
Well, I<almost> exactly... this passage from RFC 2046 dictates
that we remove any trailing spaces:
If a boundary appears to end with white space, the white space
must be presumed to have been added by a gateway, and must be deleted.
Returns undef (B<not> the empty string) if either the message is not
multipart or if there is no specified boundary.
=cut
sub multipart_boundary {
my $self = shift;
my $value = $self->mime_attr('content-type.boundary');
(!defined($value)) ? undef : $value;
}
#------------------------------
=item recommended_filename
I<Instance method.>
Return the recommended external filename. This is used when
extracting the data from the MIME stream. The filename is always
returned as a string in Perl's internal format (the UTF8 flag may be on!)
Returns undef if no filename could be suggested.
=cut
sub recommended_filename
{
my $self = shift;
# Try these headers in order, taking the first defined,
# non-blank one we find.
my $wd = supported MIME::WordDecoder 'UTF-8';
foreach my $attr_name ( qw( content-disposition.filename content-type.name ) ) {
my $value = $self->mime_attr( $attr_name );
if ( defined $value
&& $value ne ''
&& $value =~ /\S/ ) {
return $wd->decode($value);
}
}
return undef;
}
#------------------------------
=back
=cut
#------------------------------
#
# tweak_FROM_parsing
#
# DEPRECATED. Use the inherited mail_from() class method now.
sub tweak_FROM_parsing {
my $self = shift;
usage "deprecated. Use mail_from() instead.";
$self->mail_from(@_);
}
__END__
#------------------------------
=head1 NOTES
=over 4
=item Why have separate objects for the entity, head, and body?
See the documentation for the MIME-tools distribution
for the rationale behind this decision.
=item Why assume that MIME headers are email headers?
I quote from Achim Bohnet, who gave feedback on v.1.9 (I think
he's using the word "header" where I would use "field"; e.g.,
to refer to "Subject:", "Content-type:", etc.):
There is also IMHO no requirement [for] MIME::Heads to look
like [email] headers; so to speak, the MIME::Head [simply stores]
the attributes of a complex object, e.g.:
new MIME::Head type => "text/plain",
charset => ...,
disposition => ..., ... ;
I agree in principle, but (alas and dammit) RFC 2045 says otherwise.
RFC 2045 [MIME] headers are a syntactic subset of RFC-822 [email] headers.
In my mind's eye, I see an abstract class, call it MIME::Attrs, which does
what Achim suggests... so you could say:
my $attrs = new MIME::Attrs type => "text/plain",
charset => ...,
disposition => ..., ... ;
We could even make it a superclass of MIME::Head: that way, MIME::Head
would have to implement its interface, I<and> allow itself to be
initialized from a MIME::Attrs object.
However, when you read RFC 2045, you begin to see how much MIME information
is organized by its presence in particular fields. I imagine that we'd
begin to mirror the structure of RFC 2045 fields and subfields to such
a degree that this might not give us a tremendous gain over just
having MIME::Head.
=item Why all this "occurrence" and "index" jazz? Isn't every field unique?
Aaaaaaaaaahh....no.
Looking at a typical mail message header, it is sooooooo tempting to just
store the fields as a hash of strings, one string per hash entry.
Unfortunately, there's the little matter of the C<Received:> field,
which (unlike C<From:>, C<To:>, etc.) will often have multiple
occurrences; e.g.:
Received: from gsfc.nasa.gov by eryq.pr.mcs.net with smtp
(Linux Smail3.1.28.1 #5) id m0tStZ7-0007X4C;
Thu, 21 Dec 95 16:34 CST
Received: from rhine.gsfc.nasa.gov by gsfc.nasa.gov
(5.65/Ultrix3.0-C) id AA13596;
Thu, 21 Dec 95 17:20:38 -0500
Received: (from eryq@localhost) by rhine.gsfc.nasa.gov
(8.6.12/8.6.12) id RAA28069;
Thu, 21 Dec 1995 17:27:54 -0500
Date: Thu, 21 Dec 1995 17:27:54 -0500
From: Eryq <eryq@rhine.gsfc.nasa.gov>
Message-Id: <199512212227.RAA28069@rhine.gsfc.nasa.gov>
To: eryq@eryq.pr.mcs.net
Subject: Stuff and things
The C<Received:> field is used for tracing message routes, and although
it's not generally used for anything other than human debugging, I
didn't want to inconvenience anyone who actually wanted to get at that
information.
I also didn't want to make this a special case; after all, who
knows what other fields could have multiple occurrences in the
future? So, clearly, multiple entries had to somehow be stored
multiple times... and the different occurrences had to be retrievable.
=back
=head1 SEE ALSO
L<Mail::Header>, L<Mail::Field>, L<MIME::Words>, L<MIME::Tools>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
Dianne Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com
All rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
The more-comprehensive filename extraction is courtesy of
Lee E. Brotzman, Advanced Data Solutions.
=cut
1;

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,940 @@
package MIME::Parser::Filer;
=head1 NAME
MIME::Parser::Filer - manage file-output of the parser
=head1 SYNOPSIS
Before reading further, you should see L<MIME::Parser> to make sure that
you understand where this module fits into the grand scheme of things.
Go on, do it now. I'll wait.
Ready? Ok... now read L<"DESCRIPTION"> below, and everything else
should make sense.
=head2 Public interface
### Create a "filer" of the desired class:
my $filer = MIME::Parser::FileInto->new($dir);
my $filer = MIME::Parser::FileUnder->new($basedir);
...
### Want added security? Don't let outsiders name your files:
$filer->ignore_filename(1);
### Prepare for the parsing of a new top-level message:
$filer->init_parse;
### Return the path where this message's data should be placed:
$path = $filer->output_path($head);
=head2 Semi-public interface
These methods might be overridden or ignored in some subclasses,
so they don't all make sense in all circumstances:
### Tweak the mapping from content-type to extension:
$emap = $filer->output_extension_map;
$emap->{"text/html"} = ".htm";
=head1 DESCRIPTION
=head2 How this class is used when parsing
When a MIME::Parser decides that it wants to output a file to disk,
it uses its "Filer" object -- an instance of a MIME::Parser::Filer
subclass -- to determine where to put the file.
Every parser has a single Filer object, which it uses for all
parsing. You can get the Filer for a given $parser like this:
$filer = $parser->filer;
At the beginning of each C<parse()>, the filer's internal state
is reset by the parser:
$parser->filer->init_parse;
The parser can then get a path for each entity in the message
by handing that entity's header (a MIME::Head) to the filer
and having it do the work, like this:
$new_file = $parser->filer->output_path($head);
Since it's nice to be able to clean up after a parse (especially
a failed parse), the parser tells the filer when it has actually
used a path:
$parser->filer->purgeable($new_file);
Then, if you want to clean up the files which were created for a
particular parse (and also any directories that the Filer created),
you would do this:
$parser->filer->purge;
=head2 Writing your own subclasses
There are two standard "Filer" subclasses (see below):
B<MIME::Parser::FileInto>, which throws all files from all parses
into the same directory, and B<MIME::Parser::FileUnder> (preferred), which
creates a subdirectory for each message. Hopefully, these will be
sufficient for most uses, but just in case...
The only method you have to override is L<output_path()|/output_path>:
$filer->output_path($head);
This method is invoked by MIME::Parser when it wants to put a
decoded message body in an output file. The method should return a
path to the file to create. Failure is indicated by throwing an
exception.
The path returned by C<output_path()> should be "ready for open()":
any necessary parent directories need to exist at that point.
These directories can be created by the Filer, if course, and they
should be marked as B<purgeable()> if a purge should delete them.
Actually, if your issue is more I<where> the files go than
what they're named, you can use the default L<output_path()|/output_path>
method and just override one of its components:
$dir = $filer->output_dir($head);
$name = $filer->output_filename($head);
...
=head1 PUBLIC INTERFACE
=head2 MIME::Parser::Filer
This is the abstract superclass of all "filer" objects.
=over 4
=cut
use strict;
### Kit modules:
use MIME::Tools qw(:msgtypes);
use File::Spec;
use File::Path qw(rmtree);
use MIME::WordDecoder;
### Output path uniquifiers:
my $GFileNo = 0;
my $GSubdirNo = 0;
### Map content-type to extension.
### If we can't map "major/minor", we try "major/*", then use "*/*".
my %DefaultTypeToExt =
qw(
application/andrew-inset .ez
application/octet-stream .bin
application/oda .oda
application/pdf .pdf
application/pgp .pgp
application/postscript .ps
application/rtf .rtf
application/x-bcpio .bcpio
application/x-chess-pgn .pgn
application/x-cpio .cpio
application/x-csh .csh
application/x-dvi .dvi
application/x-gtar .gtar
application/x-gunzip .gz
application/x-hdf .hdf
application/x-latex .latex
application/x-mif .mif
application/x-netcdf .cdf
application/x-netcdf .nc
application/x-sh .sh
application/x-shar .shar
application/x-sv4cpio .sv4cpio
application/x-sv4crc .sv4crc
application/x-tar .tar
application/x-tcl .tcl
application/x-tex .tex
application/x-texinfo .texi
application/x-troff .roff
application/x-troff .tr
application/x-troff-man .man
application/x-troff-me .me
application/x-troff-ms .ms
application/x-ustar .ustar
application/x-wais-source .src
application/zip .zip
audio/basic .snd
audio/ulaw .au
audio/x-aiff .aiff
audio/x-wav .wav
image/gif .gif
image/ief .ief
image/jpeg .jpg
image/png .png
image/xbm .xbm
image/tiff .tif
image/x-cmu-raster .ras
image/x-portable-anymap .pnm
image/x-portable-bitmap .pbm
image/x-portable-graymap .pgm
image/x-portable-pixmap .ppm
image/x-rgb .rgb
image/x-xbitmap .xbm
image/x-xpixmap .xpm
image/x-xwindowdump .xwd
text/* .txt
text/html .html
text/plain .txt
text/richtext .rtx
text/tab-separated-values .tsv
text/x-setext .etx
text/x-vcard .vcf
video/mpeg .mpg
video/quicktime .mov
video/x-msvideo .avi
video/x-sgi-movie .movie
message/* .msg
*/* .dat
);
#------------------------------
=item new INITARGS...
I<Class method, constructor.>
Create a new outputter for the given parser.
Any subsequent arguments are given to init(), which subclasses should
override for their own use (the default init does nothing).
=cut
sub new {
my ($class, @initargs) = @_;
my $self = bless {
MPF_Prefix => "msg",
MPF_Dir => ".",
MPF_Ext => { %DefaultTypeToExt },
MPF_Purgeable => [], ### files created by the last parse
MPF_MaxName => 80, ### max filename before treated as evil
MPF_TrimRoot => 14, ### trim root to this length
MPF_TrimExt => 3, ### trim extension to this length
}, $class;
$self->init(@initargs);
$self;
}
sub init {
### no-op
}
#------------------------------
#
# cleanup_dir
#
# Instance method, private.
# Cleanup a directory, defaulting empty to "."
#
sub cleanup_dir {
my ($self, $dir) = @_;
$dir = '.' if (!defined($dir) || ($dir eq '')); # coerce empty to "."
$dir = '/.' if ($dir eq '/'); # coerce "/" so "$dir/$filename" works
$dir =~ s|/$||; # be nice: get rid of any trailing "/"
$dir;
}
#------------------------------
=item results RESULTS
I<Instance method.>
Link this filer to a MIME::Parser::Results object which will
tally the messages. Notice that we avoid linking it to the
parser to avoid circular reference!
=cut
sub results {
my ($self, $results) = @_;
$self->{MPF_Results} = $results if (@_ > 1);
$self->{MPF_Results};
}
### Log debug messages:
sub debug {
my $self = shift;
if (MIME::Tools->debugging()) {
if ($self->{MPF_Results}) {
unshift @_, $self->{MPF_Results}->indent;
$self->{MPF_Results}->msg($M_DEBUG, @_);
}
MIME::Tools::debug(@_);
}
}
### Log warning messages:
sub whine {
my $self = shift;
if ($self->{MPF_Results}) {
unshift @_, $self->{MPF_Results}->indent;
$self->{MPF_Results}->msg($M_WARNING, @_);
}
MIME::Tools::whine(@_);
}
#------------------------------
=item init_parse
I<Instance method.>
Prepare to start parsing a new message.
Subclasses should always be sure to invoke the inherited method.
=cut
sub init_parse {
my $self = shift;
$self->{MPF_Purgeable} = [];
}
#------------------------------
=item evil_filename FILENAME
I<Instance method.>
Is this an evil filename; i.e., one which should not be used
in generating a disk file name? It is if any of these are true:
* it is empty or entirely whitespace
* it contains leading or trailing whitespace
* it is a string of dots: ".", "..", etc.
* it contains characters not in the set: "A" - "Z", "a" - "z",
"0" - "9", "-", "_", "+", "=", ".", ",", "@", "#",
"$", and " ".
* it is too long
If you just want to change this behavior, you should override
this method in the subclass of MIME::Parser::Filer that you use.
B<Warning:> at the time this method is invoked, the FILENAME has
already been unmime'd into the local character set.
If you're using any character set other than ASCII, ISO-8859-*,
or UTF-8, the interpretation of the "path" characters might be
very different, and you will probably need to override this method.
See L<MIME::WordDecoder/unmime> for more details.
B<Note:> subclasses of MIME::Parser::Filer which override
output_path() might not consult this method; note, however, that
the built-in subclasses do consult it.
I<Thanks to Andrew Pimlott for finding a real dumb bug in the original
version. Thanks to Nickolay Saukh for noting that evil is in the
eye of the beholder.>
=cut
sub evil_filename {
my ($self, $name) = @_;
$self->debug("is this evil? '$name'");
return 1 if (!defined($name) or ($name eq '')); ### empty
return 1 if ($name =~ m{(^\s)|(\s+\Z)}); ### leading/trailing whitespace
return 1 if ($name =~ m{^\.+\Z}); ### dots
return 1 if ($name =~ /[^-A-Z0-9_+=.,@\#\$\% ]/i); # Only allow good chars
return 1 if ($self->{MPF_MaxName} and
(length($name) > $self->{MPF_MaxName}));
$self->debug("it's ok");
0;
}
#------------------------------
=item exorcise_filename FILENAME
I<Instance method.>
If a given filename is evil (see L</evil_filename>) we try to
rescue it by performing some basic operations: shortening it,
removing bad characters, etc., and checking each against
evil_filename().
Returns the exorcised filename (which is guaranteed to not
be evil), or undef if it could not be salvaged.
B<Warning:> at the time this method is invoked, the FILENAME has
already been unmime'd into the local character set.
If you're using anything character set other than ASCII, ISO-8859-*,
or UTF-8, the interpretation of the "path" characters might be very
very different, and you will probably need to override this method.
See L<MIME::WordDecoder/unmime> for more details.
=cut
sub exorcise_filename {
my ($self, $fname) = @_;
### Isolate to last path element:
my $last = $fname;
### Path separators are / or \
$last =~ s{^.*[/\\]}{};
### Convert semi-evil characters to underscores
$last =~ s/[\/\\\[\]:]/_/g;
if ($last and !$self->evil_filename($last)) {
$self->debug("looks like I can use the last path element");
return $last;
}
### Break last element into root and extension, and truncate:
my ($root, $ext) = (($last =~ /^(.*)\.([^\.]+)\Z/)
? ($1, $2)
: ($last, ''));
### Delete leading and trailing whitespace
$root =~ s/^\s+//;
$ext =~ s/\s+$//;
$root = substr($root, 0, ($self->{MPF_TrimRoot} || 14));
$ext = substr($ext, 0, ($self->{MPF_TrimExt} || 3));
$ext =~ /^\w+$/ or $ext = "dat";
my $trunc = $root . ($ext ? ".$ext" : '');
if (!$self->evil_filename($trunc)) {
$self->debug("looks like I can use the truncated last path element");
return $trunc;
}
### Remove all bad characters
$trunc =~ s/([^-A-Z0-9_+=.,@\#\$ ])/sprintf("%%%02X", unpack("C", $1))/ige;
if (!$self->evil_filename($trunc)) {
$self->debug("looks like I can use a munged version of the truncated last path element");
return $trunc;
}
### Hope that works:
undef;
}
#------------------------------
=item find_unused_path DIR, FILENAME
I<Instance method, subclasses only.>
We have decided on an output directory and tentative filename,
but there is a chance that it might already exist. Keep
adding a numeric suffix "-1", "-2", etc. to the filename
until an unused path is found, and then return that path.
The suffix is actually added before the first "." in the filename
is there is one; for example:
picture.gif archive.tar.gz readme
picture-1.gif archive-1.tar.gz readme-1
picture-2.gif archive-2.tar.gz readme-2
... ... ...
picture-10.gif
...
This can be a costly operation, and risky if you don't want files
renamed, so it is in your best interest to minimize situations
where these kinds of collisions occur. Unfortunately, if
a multipart message gives all of its parts the same recommended
filename, and you are placing them all in the same directory,
this method might be unavoidable.
=cut
sub find_unused_path {
my ($self, $dir, $fname) = @_;
my $i = 0;
while (1) {
### Create suffixed name (from filename), and see if we can use it:
my $suffix = ($i ? "-$i" : "");
my $sname = $fname; $sname =~ s/^(.*?)(\.|\Z)/$1$suffix$2/;
my $path = File::Spec->catfile($dir, $sname);
if (! -e $path) { ### it's good!
$i and $self->whine("collision with $fname in $dir: using $path");
return $path;
}
$self->debug("$path already taken");
} continue { ++$i; }
}
#------------------------------
=item ignore_filename [YESNO]
I<Instance method.>
Return true if we should always ignore recommended filenames in
messages, choosing instead to always generate our own filenames.
With argument, sets this value.
B<Note:> subclasses of MIME::Parser::Filer which override
output_path() might not honor this setting; note, however, that
the built-in subclasses honor it.
=cut
sub ignore_filename {
my $self = shift;
$self->{MPF_IgnoreFilename} = $_[0] if @_;
$self->{MPF_IgnoreFilename};
}
#------------------------------
=item output_dir HEAD
I<Instance method.>
Return the output directory for the given header.
The default method returns ".".
=cut
sub output_dir {
my ($self, $head) = @_;
return ".";
}
#------------------------------
=item output_filename HEAD
I<Instance method, subclasses only.>
A given recommended filename was either not given, or it was judged
to be evil. Return a fake name, possibly using information in the
message HEADer. Note that this is just the filename, not the full path.
Used by L<output_path()|/output_path>.
If you're using the default C<output_path()>, you probably don't
need to worry about avoiding collisions with existing files;
we take care of that in L<find_unused_path()|/find_unused_path>.
=cut
sub output_filename {
my ($self, $head) = @_;
### Get the recommended name:
my $recommended = $head->recommended_filename;
### Get content type:
my ($type, $subtype) = split m{/}, $head->mime_type; $subtype ||= '';
### Get recommended extension, being quite conservative:
my $recommended_ext = (($recommended and ($recommended =~ m{(\.\w+)\Z}))
? $1
: undef);
### Try and get an extension, honoring a given one first:
my $ext = ($recommended_ext ||
$self->{MPF_Ext}{"$type/$subtype"} ||
$self->{MPF_Ext}{"$type/*"} ||
$self->{MPF_Ext}{"*/*"} ||
".dat");
### Get a prefix:
++$GFileNo;
return ($self->output_prefix . "-$$-$GFileNo$ext");
}
#------------------------------
=item output_prefix [PREFIX]
I<Instance method.>
Get the short string that all filenames for extracted body-parts
will begin with (assuming that there is no better "recommended filename").
The default is F<"msg">.
If PREFIX I<is not> given, the current output prefix is returned.
If PREFIX I<is> given, the output prefix is set to the new value,
and the previous value is returned.
Used by L<output_filename()|/output_filename>.
B<Note:> subclasses of MIME::Parser::Filer which override
output_path() or output_filename() might not honor this setting;
note, however, that the built-in subclasses honor it.
=cut
sub output_prefix {
my ($self, $prefix) = @_;
$self->{MPF_Prefix} = $prefix if (@_ > 1);
$self->{MPF_Prefix};
}
#------------------------------
=item output_type_ext
I<Instance method.>
Return a reference to the hash used by the default
L<output_filename()|/output_filename> for mapping from content-types
to extensions when there is no default extension to use.
$emap = $filer->output_typemap;
$emap->{'text/plain'} = '.txt';
$emap->{'text/html'} = '.html';
$emap->{'text/*'} = '.txt';
$emap->{'*/*'} = '.dat';
B<Note:> subclasses of MIME::Parser::Filer which override
output_path() or output_filename() might not consult this hash;
note, however, that the built-in subclasses consult it.
=cut
sub output_type_ext {
my $self = shift;
return $self->{MPF_Ext};
}
#------------------------------
=item output_path HEAD
I<Instance method, subclasses only.>
Given a MIME head for a file to be extracted, come up with a good
output pathname for the extracted file. This is the only method
you need to worry about if you are building a custom filer.
The default implementation does a lot of work; subclass
implementers I<really> should try to just override its components
instead of the whole thing. It works basically as follows:
$directory = $self->output_dir($head);
$filename = $head->recommended_filename();
if (!$filename or
$self->ignore_filename() or
$self->evil_filename($filename)) {
$filename = $self->output_filename($head);
}
return $self->find_unused_path($directory, $filename);
B<Note:> There are many, many, many ways you might want to control
the naming of files, based on your application. If you don't like
the behavior of this function, you can easily define your own subclass
of MIME::Parser::Filer and override it there.
B<Note:> Nickolay Saukh pointed out that, given the subjective nature of
what is "evil", this function really shouldn't I<warn> about an evil
filename, but maybe just issue a I<debug> message. I considered that,
but then I thought: if debugging were off, people wouldn't know why
(or even if) a given filename had been ignored. In mail robots
that depend on externally-provided filenames, this could cause
hard-to-diagnose problems. So, the message is still a warning.
I<Thanks to Laurent Amon for pointing out problems with the original
implementation, and for making some good suggestions. Thanks also to
Achim Bohnet for pointing out that there should be a hookless, OO way of
overriding the output path.>
=cut
sub output_path {
my ($self, $head) = @_;
### Get the output directory:
my $dir = $self->output_dir($head);
### Get the output filename as UTF-8
my $fname = $head->recommended_filename;
### Can we use it:
if (!defined($fname)) {
$self->debug("no filename recommended: synthesizing our own");
$fname = $self->output_filename($head);
}
elsif ($self->ignore_filename) {
$self->debug("ignoring all external filenames: synthesizing our own");
$fname = $self->output_filename($head);
}
elsif ($self->evil_filename($fname)) {
### Can we save it by just taking the last element?
my $ex = $self->exorcise_filename($fname);
if (defined($ex) and !$self->evil_filename($ex)) {
$self->whine("Provided filename '$fname' is regarded as evil, ",
"but I was able to exorcise it and get something ",
"usable.");
$fname = $ex;
}
else {
$self->whine("Provided filename '$fname' is regarded as evil; ",
"I'm ignoring it and supplying my own.");
$fname = $self->output_filename($head);
}
}
$self->debug("planning to use '$fname'");
### Resolve collisions and return final path:
return $self->find_unused_path($dir, $fname);
}
#------------------------------
=item purge
I<Instance method, final.>
Purge all files/directories created by the last parse.
This method simply goes through the purgeable list in reverse order
(see L</purgeable>) and removes all existing files/directories in it.
You should not need to override this method.
=cut
sub purge {
my ($self) = @_;
foreach my $path (reverse @{$self->{MPF_Purgeable}}) {
(-e $path) or next; ### must check: might delete DIR before DIR/FILE
rmtree($path, 0, 1);
(-e $path) and $self->whine("unable to purge: $path");
}
1;
}
#------------------------------
=item purgeable [FILE]
I<Instance method, final.>
Add FILE to the list of "purgeable" files/directories (those which
will be removed if you do a C<purge()>).
You should not need to override this method.
If FILE is not given, the "purgeable" list is returned.
This may be used for more-sophisticated purging.
As a special case, invoking this method with a FILE that is an
arrayref will replace the purgeable list with a copy of the
array's contents, so [] may be used to clear the list.
Note that the "purgeable" list is cleared when a parser begins a
new parse; therefore, if you want to use purge() to do cleanup,
you I<must> do so I<before> starting a new parse!
=cut
sub purgeable {
my ($self, $path) = @_;
return @{$self->{MPF_Purgeable}} if (@_ == 1);
if (ref($path)) { $self->{MPF_Purgeable} = [ @$path ]; }
else { push @{$self->{MPF_Purgeable}}, $path; }
1;
}
=back
=cut
#------------------------------------------------------------
#------------------------------------------------------------
=head2 MIME::Parser::FileInto
This concrete subclass of MIME::Parser::Filer supports filing
into a given directory.
=over 4
=cut
package MIME::Parser::FileInto;
use strict;
use vars qw(@ISA);
@ISA = qw(MIME::Parser::Filer);
#------------------------------
=item init DIRECTORY
I<Instance method, initiallizer.>
Set the directory where all files will go.
=cut
sub init {
my ($self, $dir) = @_;
$self->{MPFI_Dir} = $self->cleanup_dir($dir);
}
#------------------------------
#
# output_dir HEAD
#
# I<Instance method, concrete override.>
# Return the output directory where the files go.
#
sub output_dir {
shift->{MPFI_Dir};
}
=back
=cut
#------------------------------------------------------------
#------------------------------------------------------------
=head2 MIME::Parser::FileUnder
This concrete subclass of MIME::Parser::Filer supports filing under
a given directory, using one subdirectory per message, but with
all message parts in the same directory.
=over 4
=cut
package MIME::Parser::FileUnder;
use strict;
use vars qw(@ISA);
@ISA = qw(MIME::Parser::Filer);
#------------------------------
=item init BASEDIR, OPTSHASH...
I<Instance method, initiallizer.>
Set the base directory which will contain the message directories.
If used, then each parse of begins by creating a new subdirectory
of BASEDIR where the actual parts of the message are placed.
OPTSHASH can contain the following:
=over 4
=item DirName
Explicitly set the name of the subdirectory which is created.
The default is to use the time, process id, and a sequence number,
but you might want a predictable directory.
=item Purge
Automatically purge the contents of the directory (including all
subdirectories) before each parse. This is really only needed if
using an explicit DirName, and is provided as a convenience only.
Currently we use the 1-arg form of File::Path::rmtree; you should
familiarize yourself with the caveats therein.
=back
The output_dir() will return the path to this message-specific directory
until the next parse is begun, so you can do this:
use File::Path;
$parser->output_under("/tmp");
$ent = eval { $parser->parse_open($msg); }; ### parse
if (!$ent) { ### parse failed
rmtree($parser->output_dir);
die "parse failed: $@";
}
else { ### parse succeeded
...do stuff...
}
=cut
sub init {
my ($self, $basedir, %opts) = @_;
$self->{MPFU_Base} = $self->cleanup_dir($basedir);
$self->{MPFU_DirName} = $opts{DirName};
$self->{MPFU_Purge} = $opts{Purge};
}
#------------------------------
#
# init_parse
#
# I<Instance method, override.>
# Prepare to start parsing a new message.
#
sub init_parse {
my $self = shift;
### Invoke inherited method first!
$self->SUPER::init_parse;
### Determine the subdirectory of their base to use:
my $subdir = (defined($self->{MPFU_DirName})
? $self->{MPFU_DirName}
: ("msg-".scalar(time)."-$$-".$GSubdirNo++));
$self->debug("subdir = $subdir");
### Determine full path to the per-message output directory:
$self->{MPFU_Dir} = File::Spec->catfile($self->{MPFU_Base}, $subdir);
### Remove and re-create the per-message output directory:
rmtree $self->output_dir if $self->{MPFU_Purge};
(-d $self->output_dir) or
mkdir $self->output_dir, 0700 or
die "mkdir ".$self->output_dir.": $!\n";
### Add the per-message output directory to the puregables:
$self->purgeable($self->output_dir);
1;
}
#------------------------------
#
# output_dir HEAD
#
# I<Instance method, concrete override.>
# Return the output directory that we used for the last parse.
#
sub output_dir {
shift->{MPFU_Dir};
}
=back
=cut
1;
__END__
=head1 SEE ALSO
L<MIME::Tools>, L<MIME::Parser>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
All rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.

View File

@ -0,0 +1,328 @@
package MIME::Parser::Reader;
=head1 NAME
MIME::Parser::Reader - a line-oriented reader for a MIME::Parser
=head1 SYNOPSIS
This module is used internally by MIME::Parser; you probably
don't need to be looking at it at all. But just in case...
### Create a top-level reader, where chunks end at EOF:
$rdr = MIME::Parser::Reader->new();
### Spawn a child reader, where chunks also end at a boundary:
$subrdr = $rdr->spawn->add_boundary($bound);
### Spawn a child reader, where chunks also end at a given string:
$subrdr = $rdr->spawn->add_terminator($string);
### Read until boundary or terminator:
$subrdr->read_chunk($in, $out);
=head1 DESCRIPTION
A line-oriented reader which can deal with virtual end-of-stream
defined by a collection of boundaries.
B<Warning:> this is a private class solely for use by MIME::Parser.
This class has no official public interface
=cut
use strict;
### All possible end-of-line sequences.
### Note that "" is included because last line of stream may have no newline!
my @EOLs = ("", "\r", "\n", "\r\n", "\n\r");
### Long line:
my $LONGLINE = ' ' x 1000;
#------------------------------
#
# new
#
# I<Class method.>
# Construct an empty (top-level) reader.
#
sub new {
my ($class) = @_;
my $eos;
return bless {
Bounds => [],
BH => {},
TH => {},
EOS => \$eos,
}, $class;
}
#------------------------------
#
# spawn
#
# I<Instance method.>
# Return a reader which is mostly a duplicate, except that the EOS
# accumulator is shared.
#
sub spawn {
my $self = shift;
my $dup = bless {}, ref($self);
$dup->{Bounds} = [ @{$self->{Bounds}} ]; ### deep copy
$dup->{BH} = { %{$self->{BH}} }; ### deep copy
$dup->{TH} = { %{$self->{TH}} }; ### deep copy
$dup->{EOS} = $self->{EOS}; ### shallow copy; same ref!
$dup;
}
#------------------------------
#
# add_boundary BOUND
#
# I<Instance method.>
# Let BOUND be the new innermost boundary. Returns self.
#
sub add_boundary {
my ($self, $bound) = @_;
unshift @{$self->{Bounds}}, $bound; ### now at index 0
$self->{BH}{"--$bound"} = "DELIM $bound";
$self->{BH}{"--$bound--"} = "CLOSE $bound";
$self;
}
#------------------------------
#
# add_terminator LINE
#
# I<Instance method.>
# Let LINE be another terminator. Returns self.
#
sub add_terminator {
my ($self, $line) = @_;
foreach (@EOLs) {
$self->{TH}{"$line$_"} = "DONE $line";
}
$self;
}
#------------------------------
#
# has_bounds
#
# I<Instance method.>
# Are there boundaries to contend with?
#
sub has_bounds {
scalar(@{shift->{Bounds}});
}
#------------------------------
#
# depth
#
# I<Instance method.>
# How many levels are there?
#
sub depth {
scalar(@{shift->{Bounds}});
}
#------------------------------
#
# eos [EOS]
#
# I<Instance method.>
# Return the last end-of-stream token seen.
# See read_chunk() for what these might be.
#
sub eos {
my $self = shift;
${$self->{EOS}} = $_[0] if @_;
${$self->{EOS}};
}
#------------------------------
#
# eos_type [EOSTOKEN]
#
# I<Instance method.>
# Return the high-level type of the given token (defaults to our token).
#
# DELIM saw an innermost boundary like --xyz
# CLOSE saw an innermost boundary like --xyz--
# DONE callback returned false
# EOF end of file
# EXT saw boundary of some higher-level
#
sub eos_type {
my ($self, $eos) = @_;
$eos = $self->eos if (@_ == 1);
if ($eos =~ /^(DONE|EOF)/) {
return $1;
}
elsif ($eos =~ /^(DELIM|CLOSE) (.*)$/) {
return (($2 eq $self->{Bounds}[0]) ? $1 : 'EXT');
}
else {
die("internal error: unable to classify boundary token ($eos)");
}
}
#------------------------------
#
# native_handle HANDLE
#
# I<Function.>
# Can we do native i/o on HANDLE? If true, returns the handle
# that will respond to native I/O calls; else, returns undef.
#
sub native_handle {
my $fh = shift;
return $fh if ($fh->isa('IO::File') || $fh->isa('IO::Handle'));
return $fh if (ref $fh eq 'GLOB');
undef;
}
#------------------------------
#
# read_chunk INHANDLE, OUTHANDLE
#
# I<Instance method.>
# Get lines until end-of-stream.
# Returns the terminating-condition token:
#
# DELIM xyz saw boundary line "--xyz"
# CLOSE xyz saw boundary line "--xyz--"
# DONE xyz saw terminator line "xyz"
# EOF end of file
# Parse up to (and including) the boundary, and dump output.
# Follows the RFC 2046 specification, that the CRLF immediately preceding
# the boundary is part of the boundary, NOT part of the input!
#
# NOTE: while parsing bodies, we take care to remember the EXACT end-of-line
# sequence. This is because we *may* be handling 'binary' encoded data, and
# in that case we can't just massage \r\n into \n! Don't worry... if the
# data is styled as '7bit' or '8bit', the "decoder" will massage the CRLF
# for us. For now, we're just trying to chop up the data stream.
# NBK - Oct 12, 1999
# The CRLF at the end of the current line is considered part
# of the boundary. I buffer the current line and output the
# last. I strip the last CRLF when I hit the boundary.
sub read_chunk {
my ($self, $in, $out, $keep_newline, $normalize_newlines) = @_;
# If we're parsing a preamble or epilogue, we need to keep the blank line
# that precedes the boundary line.
$keep_newline ||= 0;
$normalize_newlines ||= 0;
### Init:
my %bh = %{$self->{BH}};
my %th = %{$self->{TH}}; my $thx = keys %th;
local $_ = $LONGLINE;
my $maybe;
my $last = '';
my $eos = '';
### Determine types:
my $n_in = native_handle($in);
my $n_out = native_handle($out);
### Handle efficiently by type:
if ($n_in) {
if ($n_out) { ### native input, native output [fastest]
while (<$n_in>) {
# Normalize line ending
$_ =~ s/(?:\n\r|\r\n|\r)$/\n/ if $normalize_newlines;
if (substr($_, 0, 2) eq '--') {
($maybe = $_) =~ s/[ \t\r\n]+\Z//;
$bh{$maybe} and do { $eos = $bh{$maybe}; last };
}
$thx and $th{$_} and do { $eos = $th{$_}; last };
print $n_out $last; $last = $_;
}
}
else { ### native input, OO output [slower]
while (<$n_in>) {
# Normalize line ending
$_ =~ s/(?:\n\r|\r\n|\r)$/\n/ if $normalize_newlines;
if (substr($_, 0, 2) eq '--') {
($maybe = $_) =~ s/[ \t\r\n]+\Z//;
$bh{$maybe} and do { $eos = $bh{$maybe}; last };
}
$thx and $th{$_} and do { $eos = $th{$_}; last };
$out->print($last); $last = $_;
}
}
}
else {
if ($n_out) { ### OO input, native output [even slower]
while (defined($_ = $in->getline)) {
# Normalize line ending
$_ =~ s/(?:\n\r|\r\n|\r)$/\n/ if $normalize_newlines;
if (substr($_, 0, 2) eq '--') {
($maybe = $_) =~ s/[ \t\r\n]+\Z//;
$bh{$maybe} and do { $eos = $bh{$maybe}; last };
}
$thx and $th{$_} and do { $eos = $th{$_}; last };
print $n_out $last; $last = $_;
}
}
else { ### OO input, OO output [slowest]
while (defined($_ = $in->getline)) {
# Normalize line ending
$_ =~ s/(?:\n\r|\r\n|\r)$/\n/ if $normalize_newlines;
if (substr($_, 0, 2) eq '--') {
($maybe = $_) =~ s/[ \t\r\n]+\Z//;
$bh{$maybe} and do { $eos = $bh{$maybe}; last };
}
$thx and $th{$_} and do { $eos = $th{$_}; last };
$out->print($last); $last = $_;
}
}
}
# Write out last held line, removing terminating CRLF if ended on bound,
# unless the line consists only of CRLF and we're wanting to keep the
# preceding blank line (as when parsing a preamble)
$last =~ s/[\r\n]+\Z// if ($eos =~ /^(DELIM|CLOSE)/ && !($keep_newline && $last =~ m/^[\r\n]\z/));
$out->print($last);
### Save and return what we finished on:
${$self->{EOS}} = ($eos || 'EOF');
1;
}
#------------------------------
#
# read_lines INHANDLE, \@OUTLINES
#
# I<Instance method.>
# Read lines into the given array.
#
sub read_lines {
my ($self, $in, $outlines) = @_;
my $data = '';
open(my $fh, '>', \$data) or die $!;
$self->read_chunk($in, $fh);
@$outlines = split(/^/, $data);
close $fh;
1;
}
1;
__END__
=head1 SEE ALSO
L<MIME::Tools>, L<MIME::Parser>

View File

@ -0,0 +1,186 @@
package MIME::Parser::Results;
=head1 NAME
MIME::Parser::Results - results of the last entity parsed
=head1 SYNOPSIS
Before reading further, you should see L<MIME::Parser> to make sure that
you understand where this module fits into the grand scheme of things.
Go on, do it now. I'll wait.
Ready? Ok...
### Do parse, get results:
my $entity = eval { $parser->parse(\*STDIN); };
my $results = $parser->results;
### Get all messages logged:
@msgs = $results->msgs;
### Get messages of specific types (also tests if there were problems):
$had_errors = $results->errors;
$had_warnings = $results->warnings;
### Get outermost header:
$top_head = $results->top_head;
=head1 DESCRIPTION
Results from the last MIME::Parser parse.
=head1 PUBLIC INTERFACE
=over 4
=cut
use strict;
### Kit modules:
use MIME::Tools qw(:msgs);
#------------------------------
=item new
I<Constructor.>
=cut
sub new {
bless {
MPI_ID => 'MIME-parser',
MPI_Msgs => [],
MPI_Level => 0,
MPI_TopHead => undef,
}, shift;
}
#------------------------------
=item msgs
I<Instance method.>
Return all messages that we logged, in order.
Every message is a string beginning with its type followed by C<": ">;
the current types are C<debug>, C<warning>, and C<error>.
=cut
sub msgs {
@{shift->{MPI_Msgs}};
}
#------------------------------
=item errors
I<Instance method.>
Return all error messages that we logged, in order.
A convenience front-end onto msgs().
=cut
sub errors {
grep /^error: /, @{shift->{MPI_Msgs}};
}
#------------------------------
=item warnings
I<Instance method.>
Return all warning messages that we logged, in order.
A convenience front-end onto msgs().
=cut
sub warnings {
grep /^warning: /, @{shift->{MPI_Msgs}};
}
#------------------------------
=item top_head
I<Instance method.>
Return the topmost header, if we were able to read it.
This may be useful if the parse fails.
=cut
sub top_head {
my ($self, $head) = @_;
$self->{MPI_TopHead} = $head if @_ > 1;
$self->{MPI_TopHead};
}
#------------------------------
#
# PRIVATE: FOR USE DURING PARSING ONLY!
#
#------------------------------
#
# msg TYPE, MESSAGE...
#
# Take a message.
#
sub msg {
my $self = shift;
my $type = shift;
my @args = map { defined($_) ? $_ : '<<undef>>' } @_;
push @{$self->{MPI_Msgs}}, ($type.": ".join('', @args)."\n");
}
#------------------------------
#
# level [+1|-1]
#
# Return current parsing level.
#
sub level {
my ($self, $lvl) = @_;
$self->{MPI_Level} += $lvl if @_ > 1;
$self->{MPI_Level};
}
#------------------------------
#
# indent
#
# Return indent for current parsing level.
#
sub indent {
my ($self) = @_;
' ' x $self->{MPI_Level};
}
=back
=cut
1;
__END__
=head1 SEE ALSO
L<MIME::Tools>, L<MIME::Parser>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
All rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,682 @@
package MIME::WordDecoder;
=head1 NAME
MIME::WordDecoder - decode RFC 2047 encoded words to a local representation
WARNING: Most of this module is deprecated and may disappear. The only
function you should use for MIME decoding is "mime_to_perl_string".
=head1 SYNOPSIS
See L<MIME::Words> for the basics of encoded words.
See L<"DESCRIPTION"> for how this class works.
use MIME::WordDecoder;
### Get the default word-decoder (used by unmime()):
$wd = default MIME::WordDecoder;
### Get a word-decoder which maps to ISO-8859-1 (Latin1):
$wd = supported MIME::WordDecoder "ISO-8859-1";
### Decode a MIME string (e.g., into Latin1) via the default decoder:
$str = $wd->decode('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
### Decode a string using the default decoder, non-OO style:
$str = unmime('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
### Decode a string to an internal Perl string, non-OO style
### The result is likely to have the UTF8 flag ON.
$str = mime_to_perl_string('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
=head1 DESCRIPTION
WARNING: Most of this module is deprecated and may disappear. It
duplicates (badly) the function of the standard 'Encode' module. The
only function you should rely on is mime_to_perl_string.
A MIME::WordDecoder consists, fundamentally, of a hash which maps
a character set name (US-ASCII, ISO-8859-1, etc.) to a subroutine which
knows how to take bytes in that character set and turn them into
the target string representation. Ideally, this target representation
would be Unicode, but we don't want to overspecify the translation
that takes place: if you want to convert MIME strings directly to Big5,
that's your own decision.
The subroutine will be invoked with two arguments: DATA (the data in
the given character set), and CHARSET (the upcased character set name).
For example:
### Keep 7-bit characters as-is, convert 8-bit characters to '#':
sub keep7bit {
local $_ = shift;
tr/\x00-\x7F/#/c;
$_;
}
Here's a decoder which uses that:
### Construct a decoder:
$wd = MIME::WordDecoder->new({'US-ASCII' => "KEEP", ### sub { $_[0] }
'ISO-8859-1' => \&keep7bit,
'ISO-8859-2' => \&keep7bit,
'Big5' => "WARN",
'*' => "DIE"});
### Convert some MIME text to a pure ASCII string...
$ascii = $wd->decode('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
### ...which will now hold: "To: Keld J#rn Simonsen <keld>"
The UTF-8 built-in decoder decodes everything into Perl's internal
string format, possibly turning on the internal UTF8 flag. Use it like
this:
$wd = supported MIME::WordDecoder 'UTF-8';
$perl_string = $wd->decode('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
# perl_string will be a valid UTF-8 string with the "UTF8" flag set.
Generally, you should use the UTF-8 decoder in preference to "unmime".
=head1 PUBLIC INTERFACE
=over
=cut
use strict;
use Carp qw( carp croak );
use MIME::Words qw(decode_mimewords);
use Exporter;
use vars qw(@ISA @EXPORT);
@ISA = qw(Exporter);
@EXPORT = qw( unmime mime_to_perl_string );
#------------------------------
#
# Globals
#
#------------------------------
### Decoders.
my %DecoderFor = ();
### Standard handlers.
my %Handler =
(
KEEP => sub {$_[0]},
IGNORE => sub {''},
WARN => sub { carp "ignoring text in character set `$_[1]'\n" },
DIE => sub { croak "can't handle text in character set `$_[1]'\n" },
);
### Global default decoder. We init it below.
my $Default;
### Global UTF8 decoder.
my $DefaultUTF8;
#------------------------------
=item default [DECODER]
I<Class method.>
Get/set the default DECODER object.
=cut
sub default {
my $class = shift;
if (@_) {
$Default = shift;
}
$Default;
}
#------------------------------
=item supported CHARSET, [DECODER]
I<Class method.>
If just CHARSET is given, returns a decoder object which maps
data into that character set (the character set is forced to
all-uppercase).
$wd = supported MIME::WordDecoder "ISO-8859-1";
If DECODER is given, installs such an object:
MIME::WordDecoder->supported("ISO-8859-1" =>
(new MIME::WordDecoder::ISO_8859 "1"));
You should not override this method.
=cut
sub supported {
my ($class, $charset, $decoder) = @_;
$DecoderFor{uc($charset)} = $decoder if (@_ > 2);
$DecoderFor{uc($charset)};
}
#------------------------------
=item new [\@HANDLERS]
I<Class method, constructor.>
If \@HANDLERS is given, then @HANDLERS is passed to handler()
to initialize the internal map.
=cut
sub new {
my ($class, $h) = @_;
my $self = bless { MWD_Map=>{} }, $class;
### Init the map:
$self->handler(@$h);
### Add fallbacks:
$self->{MWD_Map}{'*'} ||= $Handler{WARN};
$self->{MWD_Map}{'raw'} ||= $self->{MWD_Map}{'US-ASCII'};
$self;
}
#------------------------------
=item handler CHARSET=>\&SUBREF, ...
I<Instance method.>
Set the handler SUBREF for a given CHARSET, for as many pairs
as you care to supply.
When performing the translation of a MIME-encoded string, a
given SUBREF will be invoked when translating a block of text
in character set CHARSET. The subroutine will be invoked with
the following arguments:
DATA - the data in the given character set.
CHARSET - the upcased character set name, which may prove useful
if you are using the same SUBREF for multiple CHARSETs.
DECODER - the decoder itself, if it contains configuration information
that your handler function needs.
For example:
$wd = new MIME::WordDecoder;
$wd->handler('US-ASCII' => "KEEP");
$wd->handler('ISO-8859-1' => \&handle_latin1,
'ISO-8859-2' => \&handle_latin1,
'*' => "DIE");
Notice that, much as with %SIG, the SUBREF can also be taken from
a set of special keywords:
KEEP Pass data through unchanged.
IGNORE Ignore data in this character set, without warning.
WARN Ignore data in this character set, with warning.
DIE Fatal exception with "can't handle character set" message.
The subroutine for the special CHARSET of 'raw' is used for raw
(non-MIME-encoded) text, which is supposed to be US-ASCII.
The handler for 'raw' defaults to whatever was specified for 'US-ASCII'
at the time of construction.
The subroutine for the special CHARSET of '*' is used for any
unrecognized character set. The default action for '*' is WARN.
=cut
sub handler {
my $self = shift;
### Copy the hash, and edit it:
while (@_) {
my $c = shift;
my $sub = shift;
$self->{MWD_Map}{$c} = $self->real_handler($sub);
}
$self;
}
#------------------------------
=item decode STRING
I<Instance method.>
Decode a STRING which might contain MIME-encoded components into a
local representation (e.g., UTF-8, etc.).
=cut
sub decode {
my ($self, $str) = @_;
defined($str) or return undef;
join('', map {
### Get the data and (upcased) charset:
my $data = $_->[0];
my $charset = (defined($_->[1]) ? uc($_->[1]) : 'raw');
$charset =~ s/\*\w+\Z//; ### RFC2184 language suffix
### Get the handler; guess if never seen before:
defined($self->{MWD_Map}{$charset}) or
$self->{MWD_Map}{$charset} =
($self->real_handler($self->guess_handler($charset)) || 0);
my $subr = $self->{MWD_Map}{$charset} || $self->{MWD_Map}{'*'};
### Map this chunk:
&$subr($data, $charset, $self);
} decode_mimewords($str));
}
#------------------------------
#
# guess_handler CHARSET
#
# Instance method.
# An unrecognized charset has been seen. Guess a handler subref
# for the given charset, returning false if there is none.
# Successful mappings will be cached in the main map.
#
sub guess_handler {
undef;
}
#------------------------------
#
# real_handler HANDLER
#
# Instance method.
# Translate the given handler, which might be a subref or a string.
#
sub real_handler {
my ($self, $sub) = @_;
(!$sub) or
(ref($sub) eq 'CODE') or
$sub = ($Handler{$sub} || croak "bad named handler: $sub\n");
$sub;
}
#------------------------------
=item unmime STRING
I<Function, exported.>
Decode the given STRING using the default() decoder.
See L<default()|/default>.
You should consider using the UTF-8 decoder instead. It decodes
MIME strings into Perl's internal string format.
=cut
sub unmime($) {
my $str = shift;
$Default->decode($str);
}
=item mime_to_perl_string
I<Function, exported.>
Decode the given STRING into an internal Perl Unicode string.
You should use this function in preference to all others.
The result of mime_to_perl_string is likely to have Perl's
UTF8 flag set.
=cut
sub mime_to_perl_string($) {
my $str = shift;
$DecoderFor{'UTF-8'}->decode($str);
}
=back
=cut
=head1 SUBCLASSES
=over
=cut
#------------------------------------------------------------
#------------------------------------------------------------
=item MIME::WordDecoder::ISO_8859
A simple decoder which keeps US-ASCII and the 7-bit characters
of ISO-8859 character sets and UTF8, and also keeps 8-bit
characters from the indicated character set.
### Construct:
$wd = new MIME::WordDecoder::ISO_8859 2; ### ISO-8859-2
### What to translate unknown characters to (can also use empty):
### Default is "?".
$wd->unknown("?");
### Collapse runs of unknown characters to a single unknown()?
### Default is false.
$wd->collapse(1);
According to B<http://czyborra.com/charsets/iso8859.html>
(ca. November 2000):
ISO 8859 is a full series of 10 (and soon even more) standardized
multilingual single-byte coded (8bit) graphic character sets for
writing in alphabetic languages:
1. Latin1 (West European)
2. Latin2 (East European)
3. Latin3 (South European)
4. Latin4 (North European)
5. Cyrillic
6. Arabic
7. Greek
8. Hebrew
9. Latin5 (Turkish)
10. Latin6 (Nordic)
The ISO 8859 charsets are not even remotely as complete as the truly
great Unicode but they have been around and usable for quite a while
(first registered Internet charsets for use with MIME) and have
already offered a major improvement over the plain 7bit US-ASCII.
Characters 0 to 127 are always identical with US-ASCII and the
positions 128 to 159 hold some less used control characters: the
so-called C1 set from ISO 6429.
=cut
package MIME::WordDecoder::ISO_8859;
use strict;
use vars qw(@ISA);
@ISA = qw( MIME::WordDecoder );
#------------------------------
#
# HANDLERS
#
#------------------------------
### Keep 7bit characters.
### Turn all else to the special \x00.
sub h_keep7bit {
local $_ = $_[0];
# my $unknown = $_[2]->{MWDI_Unknown};
s{[\x80-\xFF]}{\x00}g;
$_;
}
### Note: should use Unicode::String, converting/manipulating
### everything into full Unicode form.
### Keep 7bit UTF8 characters (ASCII).
### Keep ISO-8859-1 if this decoder is for Latin-1.
### Turn all else to the special \x00.
sub h_utf8 {
local $_ = $_[0];
# my $unknown = $_[2]->{MWDI_Unknown};
my $latin1 = ($_[2]->{MWDI_Num} == 1);
#print STDERR "UTF8 in: <$_>\n";
local($1,$2,$3);
my $tgt = '';
while (m{\G(
([\x00-\x7F]) | # 0xxxxxxx
([\xC0-\xDF] [\x80-\xBF]) | # 110yyyyy 10xxxxxx
([\xE0-\xEF] [\x80-\xBF]{2}) | # 1110zzzz 10yyyyyy 10xxxxxx
([\xF0-\xF7] [\x80-\xBF]{3}) | # 11110uuu 10uuzzzz 10yyyyyy 10xxxxxx
. # error; synch
)}gcsx and ($1 ne '')) {
if (defined($2)) { $tgt .= $2 }
elsif (defined($3) && $latin1) { $tgt .= "\x00" }
else { $tgt .= "\x00" }
}
#print STDERR "UTF8 out: <$tgt>\n";
$tgt;
}
### Keep characters which are 7bit in UTF8 (ASCII).
### Keep ISO-8859-1 if this decoder is for Latin-1.
### Turn all else to the special \x00.
sub h_utf16 {
local $_ = $_[0];
# my $unknown = $_[2]->{MWDI_Unknown};
my $latin1 = ($_[2]->{MWDI_Num} == 1);
#print STDERR "UTF16 in: <$_>\n";
local($1,$2,$3,$4,$5);
my $tgt = '';
while (m{\G(
( \x00 ([\x00-\x7F])) | # 00000000 0xxxxxxx
( \x00 ([\x80-\xFF])) | # 00000000 1xxxxxxx
( [^\x00] [\x00-\xFF]) | # etc
)
}gcsx and ($1 ne '')) {
if (defined($2)) { $tgt .= $3 }
elsif (defined($4) && $latin1) { $tgt .= $5 }
else { $tgt .= "\x00" }
}
#print STDERR "UTF16 out: <$tgt>\n";
$tgt;
}
#------------------------------
#
# PUBLIC INTERFACE
#
#------------------------------
#------------------------------
#
# new NUMBER
#
sub new {
my ($class, $num) = @_;
my $self = $class->SUPER::new();
$self->handler('raw' => 'KEEP',
'US-ASCII' => 'KEEP');
$self->{MWDI_Num} = $num;
$self->{MWDI_Unknown} = "?";
$self->{MWDI_Collapse} = 0;
$self;
}
#------------------------------
#
# guess_handler CHARSET
#
sub guess_handler {
my ($self, $charset) = @_;
return 'KEEP' if (($charset =~ /^ISO[-_]?8859[-_](\d+)$/) &&
($1 eq $self->{MWDI_Num}));
return \&h_keep7bit if ($charset =~ /^ISO[-_]?8859/);
return \&h_utf8 if ($charset =~ /^UTF[-_]?8$/);
return \&h_utf16 if ($charset =~ /^UTF[-_]?16$/);
undef;
}
#------------------------------
#
# unknown [REPLACEMENT]
#
sub unknown {
my $self = shift;
$self->{MWDI_Unknown} = shift if @_;
$self->{MWDI_Unknown};
}
#------------------------------
#
# collapse [YESNO]
#
sub collapse {
my $self = shift;
$self->{MWDI_Collapse} = shift if @_;
$self->{MWDI_Collapse};
}
#------------------------------
#
# decode STRING
#
sub decode {
my $self = shift;
### Do inherited action:
my $basic = $self->SUPER::decode(@_);
defined($basic) or return undef;
### Translate/consolidate illegal characters:
$basic =~ tr{\x00}{\x00}c if $self->{MWDI_Collapse};
$basic =~ s{\x00}{$self->{MWDI_Unknown}}g;
$basic;
}
#------------------------------------------------------------
#------------------------------------------------------------
=item MIME::WordDecoder::US_ASCII
A subclass of the ISO-8859-1 decoder which discards 8-bit characters.
You're probably better off using ISO-8859-1.
=cut
package MIME::WordDecoder::US_ASCII;
use strict;
use vars qw(@ISA);
@ISA = qw( MIME::WordDecoder::ISO_8859 );
sub new {
my ($class) = @_;
return $class->SUPER::new("1");
}
sub decode {
my $self = shift;
### Do inherited action:
my $basic = $self->SUPER::decode(@_);
defined($basic) or return undef;
### Translate/consolidate 8-bit characters:
$basic =~ tr{\x80-\xFF}{}c if $self->{MWDI_Collapse};
$basic =~ s{[\x80-\xFF]}{$self->{MWDI_Unknown}}g;
$basic;
}
=back
=cut
package MIME::WordDecoder::UTF_8;
use strict;
use Encode qw();
use Carp qw( carp );
use vars qw(@ISA);
@ISA = qw( MIME::WordDecoder );
sub h_convert_to_utf8
{
my ($data, $charset, $decoder) = @_;
$charset = 'US-ASCII' if ($charset eq 'raw');
my $enc = Encode::find_encoding($charset);
if (!$enc) {
carp "Unable to convert text in character set `$charset' to UTF-8... ignoring\n";
return '';
}
my $ans = $enc->decode($data, Encode::FB_PERLQQ);
return $ans;
}
sub new {
my ($class) = @_;
my $self = $class->SUPER::new();
$self->handler('*' => \&h_convert_to_utf8);
}
#------------------------------------------------------------
#------------------------------------------------------------
package MIME::WordDecoder;
### Now we can init the default handler.
$Default = (MIME::WordDecoder::ISO_8859->new('1'));
### Add US-ASCII handler:
$DecoderFor{"US-ASCII"} = MIME::WordDecoder::US_ASCII->new;
### Add ISO-8859-{1..15} handlers:
for (1..15) {
$DecoderFor{"ISO-8859-$_"} = MIME::WordDecoder::ISO_8859->new($_);
}
### UTF-8
$DecoderFor{'UTF-8'} = MIME::WordDecoder::UTF_8->new();
1; # end the module
__END__
=head1 SEE ALSO
L<MIME::Tools>
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
Dianne Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com
=cut
BEGIN { unshift @INC, ".", "./etc", "./lib" };
import MIME::WordDecoder;
### Decode a MIME string (e.g., into Latin1) via the default decoder:
my $charset = $ARGV[0] || 'ISO-8859-1';
my $wd = MIME::WordDecoder->supported($charset) || die "unsupported charset: $charset\n";
$wd->unknown('#');
my @encs = (
'ASCII: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>',
'Latin1: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>',
'Latin1: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>',
'Latin1: =?ISO-8859-1?Q?Andr=E9_?=Pirard <PIRARD@vm1.ulg.ac.be>',
' UTF-8: =?UTF-8?Q?Andr=E9_?=Pirard <PIRARD@vm1.ulg.ac.be>',
'UTF-16: =?UTF-16?Q?=00A=00n=00d=00r=00=E9?= Pirard <PIRARD@vm1.ulg.ac.be>',
('=?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?='.
'=?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?='.
'=?US-ASCII?Q?.._cool!?='));
$str = $wd->decode(join "\n", @encs);
print "$str\n";
1;

View File

@ -0,0 +1,353 @@
package MIME::Words;
=head1 NAME
MIME::Words - deal with RFC 2047 encoded words
=head1 SYNOPSIS
Before reading further, you should see L<MIME::Tools> to make sure that
you understand where this module fits into the grand scheme of things.
Go on, do it now. I'll wait.
Ready? Ok...
use MIME::Words qw(:all);
### Decode the string into another string, forgetting the charsets:
$decoded = decode_mimewords(
'To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>',
);
### Split string into array of decoded [DATA,CHARSET] pairs:
@decoded = decode_mimewords(
'To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>',
);
### Encode a single unsafe word:
$encoded = encode_mimeword("\xABFran\xE7ois\xBB");
### Encode a string, trying to find the unsafe words inside it:
$encoded = encode_mimewords("Me and \xABFran\xE7ois\xBB in town");
=head1 DESCRIPTION
Fellow Americans, you probably won't know what the hell this module
is for. Europeans, Russians, et al, you probably do. C<:-)>.
For example, here's a valid MIME header you might get:
From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
=?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
=?US-ASCII?Q?.._cool!?=
The fields basically decode to (sorry, I can only approximate the
Latin characters with 7 bit sequences /o and 'e):
From: Keith Moore <moore@cs.utk.edu>
To: Keld J/orn Simonsen <keld@dkuug.dk>
CC: Andr'e Pirard <PIRARD@vm1.ulg.ac.be>
Subject: If you can read this you understand the example... cool!
=head1 PUBLIC INTERFACE
=over 4
=cut
require 5.001;
### Pragmas:
use strict;
use re 'taint';
use vars qw($VERSION @EXPORT_OK %EXPORT_TAGS @ISA);
### Exporting:
use Exporter;
%EXPORT_TAGS = (all => [qw(decode_mimewords
encode_mimeword
encode_mimewords
)]);
Exporter::export_ok_tags('all');
### Inheritance:
@ISA = qw(Exporter);
### Other modules:
use MIME::Base64;
use MIME::QuotedPrint;
#------------------------------
#
# Globals...
#
#------------------------------
### The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";
### Nonprintables (controls + x7F + 8bit):
my $NONPRINT = "\\x00-\\x1F\\x7F-\\xFF";
#------------------------------
# _decode_Q STRING
# Private: used by _decode_header() to decode "Q" encoding, which is
# almost, but not exactly, quoted-printable. :-P
sub _decode_Q {
my $str = shift;
local $1;
$str =~ s/_/\x20/g; # RFC-1522, Q rule 2
$str =~ s/=([\da-fA-F]{2})/pack("C", hex($1))/ge; # RFC-1522, Q rule 1
$str;
}
# _encode_Q STRING
# Private: used by _encode_header() to decode "Q" encoding, which is
# almost, but not exactly, quoted-printable. :-P
sub _encode_Q {
my $str = shift;
local $1;
$str =~ s{([ _\?\=$NONPRINT])}{sprintf("=%02X", ord($1))}eog;
$str;
}
# _decode_B STRING
# Private: used by _decode_header() to decode "B" encoding.
sub _decode_B {
my $str = shift;
decode_base64($str);
}
# _encode_B STRING
# Private: used by _decode_header() to decode "B" encoding.
sub _encode_B {
my $str = shift;
encode_base64($str, '');
}
#------------------------------
=item decode_mimewords ENCODED
I<Function.>
Go through the string looking for RFC 2047-style "Q"
(quoted-printable, sort of) or "B" (base64) encoding, and decode them.
B<In an array context,> splits the ENCODED string into a list of decoded
C<[DATA, CHARSET]> pairs, and returns that list. Unencoded
data are returned in a 1-element array C<[DATA]>, giving an effective
CHARSET of C<undef>.
$enc = '=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>';
foreach (decode_mimewords($enc)) {
print "", ($_->[1] || 'US-ASCII'), ": ", $_->[0], "\n";
}
B<In a scalar context,> joins the "data" elements of the above
list together, and returns that. I<Warning: this is information-lossy,>
and probably I<not> what you want, but if you know that all charsets
in the ENCODED string are identical, it might be useful to you.
(Before you use this, please see L<MIME::WordDecoder/unmime>,
which is probably what you want.)
In the event of a syntax error, $@ will be set to a description
of the error, but parsing will continue as best as possible (so as to
get I<something> back when decoding headers).
$@ will be false if no error was detected.
Any arguments past the ENCODED string are taken to define a hash of options:
=cut
sub decode_mimewords {
my $encstr = shift;
my @tokens;
local($1,$2,$3);
$@ = ''; ### error-return
### Collapse boundaries between adjacent encoded words:
$encstr =~ s{(\?\=)\s*(\=\?)}{$1$2}gs;
pos($encstr) = 0;
### print STDOUT "ENC = [", $encstr, "]\n";
### Decode:
my ($charset, $encoding, $enc, $dec);
while (1) {
last if (pos($encstr) >= length($encstr));
my $pos = pos($encstr); ### save it
### Case 1: are we looking at "=?..?..?="?
if ($encstr =~ m{\G # from where we left off..
=\?([^?]*) # "=?" + charset +
\?([bq]) # "?" + encoding +
\?([^?]+) # "?" + data maybe with spcs +
\?= # "?="
}xgi) {
($charset, $encoding, $enc) = ($1, lc($2), $3);
$dec = (($encoding eq 'q') ? _decode_Q($enc) : _decode_B($enc));
push @tokens, [$dec, $charset];
next;
}
### Case 2: are we looking at a bad "=?..." prefix?
### We need this to detect problems for case 3, which stops at "=?":
pos($encstr) = $pos; # reset the pointer.
if ($encstr =~ m{\G=\?}xg) {
$@ .= qq|unterminated "=?..?..?=" in "$encstr" (pos $pos)\n|;
push @tokens, ['=?'];
next;
}
### Case 3: are we looking at ordinary text?
pos($encstr) = $pos; # reset the pointer.
if ($encstr =~ m{\G # from where we left off...
(.*? # shortest possible string,
\n*) # followed by 0 or more NLs,
(?=(\Z|=\?)) # terminated by "=?" or EOS
}sxg) {
length($1) or die "MIME::Words: internal logic err: empty token\n";
push @tokens, [$1];
next;
}
### Case 4: bug!
die "MIME::Words: unexpected case:\n($encstr) pos $pos\n\t".
"Please alert developer.\n";
}
return (wantarray ? @tokens : join('',map {$_->[0]} @tokens));
}
#------------------------------
=item encode_mimeword RAW, [ENCODING], [CHARSET]
I<Function.>
Encode a single RAW "word" that has unsafe characters.
The "word" will be encoded in its entirety.
### Encode "<<Franc,ois>>":
$encoded = encode_mimeword("\xABFran\xE7ois\xBB");
You may specify the ENCODING (C<"Q"> or C<"B">), which defaults to C<"Q">.
You may specify the CHARSET, which defaults to C<iso-8859-1>.
=cut
sub encode_mimeword {
my $word = shift;
my $encoding = uc(shift || 'Q');
my $charset = uc(shift || 'ISO-8859-1');
my $encfunc = (($encoding eq 'Q') ? \&_encode_Q : \&_encode_B);
"=?$charset?$encoding?" . &$encfunc($word) . "?=";
}
#------------------------------
=item encode_mimewords RAW, [OPTS]
I<Function.>
Given a RAW string, try to find and encode all "unsafe" sequences
of characters:
### Encode a string with some unsafe "words":
$encoded = encode_mimewords("Me and \xABFran\xE7ois\xBB");
Returns the encoded string.
Any arguments past the RAW string are taken to define a hash of options:
=over 4
=item Charset
Encode all unsafe stuff with this charset. Default is 'ISO-8859-1',
a.k.a. "Latin-1".
=item Encoding
The encoding to use, C<"q"> or C<"b">. The default is C<"q">.
=back
B<Warning:> this is a quick-and-dirty solution, intended for character
sets which overlap ASCII. B<It does not comply with the RFC 2047
rules regarding the use of encoded words in message headers>.
You may want to roll your own variant,
using C<encode_mimeword()>, for your application.
I<Thanks to Jan Kasprzak for reminding me about this problem.>
=cut
sub encode_mimewords {
my ($rawstr, %params) = @_;
my $charset = $params{Charset} || 'ISO-8859-1';
my $encoding = lc($params{Encoding} || 'q');
### Encode any "words" with unsafe characters.
### We limit such words to 18 characters, to guarantee that the
### worst-case encoding give us no more than 54 + ~10 < 75 characters
my $word;
local $1;
$rawstr =~ s{([a-zA-Z0-9\x7F-\xFF]+\s*)}{ ### get next "word"
$word = $1;
(($word !~ /(?:[$NONPRINT])|(?:^\s+$)/o)
? $word ### no unsafe chars
: encode_mimeword($word, $encoding, $charset)); ### has unsafe chars
}xeg;
$rawstr =~ s/\?==\?/?= =?/g;
$rawstr;
}
1;
__END__
=back
=head1 SEE ALSO
L<MIME::Base64>, L<MIME::QuotedPrint>, L<MIME::Tools>
For other implementations of this or similar functionality (particularly, ones
with proper UTF8 support), see:
L<Encode::MIME::Header>, L<MIME::EncWords>, L<MIME::AltWords>
At some future point, one of these implementations will likely replace
MIME::Words and MIME::Words will become deprecated.
=head1 NOTES
Exports its principle functions by default, in keeping with
MIME::Base64 and MIME::QuotedPrint.
=head1 AUTHOR
Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>).
Dianne Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com
All rights reserved. This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
Thanks also to...
Kent Boortz For providing the idea, and the baseline
RFC-1522-decoding code!
KJJ at PrimeNet For requesting that this be split into
its own module.
Stephane Barizien For reporting a nasty bug.