class String
Extensions to the String
class
TODO make riphtml() just call ircify_html
() with stronger purify options.
We start by extending the String
class with some IRC-specific methods
Extension for String
class.
String#%
method which accept “named argument”. The translator can know the meaning of the msgids using “named argument” instead of %s/%d style.
Public Instance Methods
Format - Uses str as a format specification, and returns the result of applying it to arg. If the format specification contains more than one substitution, then arg must be an Array
containing the values to be substituted. See Kernel::sprintf for details of the format string. This is the default behavior of the String
class.
(e.g.) "%s, %s" % ["Masao", "Mutoh"]
Also you can use a Hash as the “named argument”. This is recommended way for Ruby-GetText because the translators can understand the meanings of the msgids easily.
-
hash: {:key1 => value1, :key2 => value2, … }
-
Returns: formatted
String
(e.g.) "%{firstname}, %{familyname}" % {:firstname => "Masao", :familyname => "Mutoh"}
# File lib/rbot/load-gettext.rb, line 209 def %(args) if args.is_a?(Hash) ret = dup args.each do |key, value| ret.gsub!(/\%\{#{key}\}/, value.to_s) end ret else ret = gsub(/%\{/, '%%{') begin ret._old_format_m(args) rescue ArgumentError $stderr.puts " The string:#{ret}" $stderr.puts " args:#{args.inspect}" end end end
Format a string using IRC colors
# File lib/rbot/core/utils/extends.rb, line 376 def colorformat txt = self.dup txt.gsub!(/\*([^\*]+)\*/, Bold + '\\1' + NormalText) return txt end
This method tries to find an HTML title in the string, and returns it if found
# File lib/rbot/core/utils/extends.rb, line 349 def get_html_title if defined? ::Hpricot Hpricot(self).at("title").inner_html else return unless Irc::Utils::TITLE_REGEX.match(self) $1 end end
This method checks if the receiver contains IRC glob characters
IRC has a very primitive concept of globs: a *
stands for “any number of arbitrary characters”, a ?
stands for “one and exactly one arbitrary character”. These characters can be escaped by prefixing them with a slash (\
).
A known limitation of this glob syntax is that there is no way to escape the escape character itself, so it’s not possible to build a glob pattern where the escape character precedes a glob.
# File lib/rbot/irc.rb, line 333 def has_irc_glob? self =~ /^[*?]|[^\\][*?]/ end
This method returns a string which is the downcased version of the receiver, according to the given casemap
# File lib/rbot/irc.rb, line 290 def irc_downcase(casemap='rfc1459') cmap = casemap.to_irc_casemap self.tr(cmap.upper, cmap.lower) end
This is the same as the above, except that the string is altered in place
See also the discussion about irc_downcase
# File lib/rbot/irc.rb, line 299 def irc_downcase!(casemap='rfc1459') cmap = casemap.to_irc_casemap self.tr!(cmap.upper, cmap.lower) end
Calculate the penalty which will be assigned to this message by the IRCd
# File lib/rbot/ircsocket.rb, line 14 def irc_send_penalty # According to eggdrop, the initial penalty is penalty = 1 + self.size/100 # on everything but UnderNET where it's # penalty = 2 + self.size/120 cmd, pars = self.split($;,2) debug "cmd: #{cmd}, pars: #{pars.inspect}" case cmd.to_sym when :KICK chan, nick, msg = pars.split chan = chan.split(',') nick = nick.split(',') penalty += nick.size penalty *= chan.size when :MODE chan, modes, argument = pars.split extra = 0 if modes extra = 1 if argument extra += modes.split(/\+|-/).size else extra += 3 * modes.split(/\+|-/).size end end if argument extra += 2 * argument.split.size end penalty += extra * chan.split.size when :TOPIC penalty += 1 penalty += 2 unless pars.split.size < 2 when :PRIVMSG, :NOTICE dests = pars.split($;,2).first penalty += dests.split(',').size when :WHO args = pars.split if args.length > 0 penalty += args.inject(0){ |sum,x| sum += ((x.length > 4) ? 3 : 5) } else penalty += 10 end when :PART penalty += 4 when :AWAY, :JOIN, :VERSION, :TIME, :TRACE, :WHOIS, :DNS penalty += 2 when :INVITE, :NICK penalty += 3 when :ISON penalty += 1 else # Unknown messages penalty += 1 end if penalty > 99 debug "Wow, more than 99 secs of penalty!" penalty = 99 end if penalty < 2 debug "Wow, less than 2 secs of penalty!" penalty = 2 end debug "penalty: #{penalty}" return penalty end
Upcasing functions are provided too
See also the discussion about irc_downcase
# File lib/rbot/irc.rb, line 308 def irc_upcase(casemap='rfc1459') cmap = casemap.to_irc_casemap self.tr(cmap.lower, cmap.upper) end
In-place upcasing
See also the discussion about irc_downcase
# File lib/rbot/irc.rb, line 317 def irc_upcase!(casemap='rfc1459') cmap = casemap.to_irc_casemap self.tr!(cmap.lower, cmap.upper) end
This method will return a purified version of the receiver, with all HTML stripped off and some of it converted to IRC formatting
# File lib/rbot/core/utils/extends.rb, line 225 def ircify_html(opts={}) txt = self.dup # remove scripts txt.gsub!(/<script(?:\s+[^>]*)?>.*?<\/script>/im, "") # remove styles txt.gsub!(/<style(?:\s+[^>]*)?>.*?<\/style>/im, "") # bold and strong -> bold txt.gsub!(/<\/?(?:b|strong)(?:\s+[^>]*)?>/im, "#{Bold}") # italic, emphasis and underline -> underline txt.gsub!(/<\/?(?:i|em|u)(?:\s+[^>]*)?>/im, "#{Underline}") ## This would be a nice addition, but the results are horrible ## Maybe make it configurable? # txt.gsub!(/<\/?a( [^>]*)?>/, "#{Reverse}") case val = opts[:a_href] when Reverse, Bold, Underline txt.gsub!(/<(?:\/a\s*|a (?:[^>]*\s+)?href\s*=\s*(?:[^>]*\s*)?)>/, val) when :link_out # Not good for nested links, but the best we can do without something like hpricot txt.gsub!(/<a (?:[^>]*\s+)?href\s*=\s*(?:([^"'>][^\s>]*)\s+|"((?:[^"]|\\")*)"|'((?:[^']|\\')*)')(?:[^>]*\s+)?>(.*?)<\/a>/) { |match| debug match debug [$1, $2, $3, $4].inspect link = $1 || $2 || $3 str = $4 str + ": " + link } else warning "unknown :a_href option #{val} passed to ircify_html" if val end # If opts[:img] is defined, it should be a String. Each image # will be replaced by the string itself, replacing occurrences of # %{alt} %{dimensions} and %{src} with the alt text, image dimensions # and URL if val = opts[:img] if val.kind_of? String txt.gsub!(/<img\s+(.*?)\s*\/?>/) do |imgtag| attrs = Hash.new imgtag.scan(/([[:alpha:]]+)\s*=\s*(['"])?(.*?)\2/) do |key, quote, value| k = key.downcase.intern rescue 'junk' attrs[k] = value end attrs[:alt] ||= attrs[:title] attrs[:width] ||= '...' attrs[:height] ||= '...' attrs[:dimensions] ||= "#{attrs[:width]}x#{attrs[:height]}" val % attrs end else warning ":img option is not a string" end end # Paragraph and br tags are converted to whitespace txt.gsub!(/<\/?(p|br)(?:\s+[^>]*)?\s*\/?\s*>/i, ' ') txt.gsub!("\n", ' ') txt.gsub!("\r", ' ') # Superscripts and subscripts are turned into ^{...} and _{...} # where the {} are omitted for single characters txt.gsub!(/<sup>(.*?)<\/sup>/, '^{\1}') txt.gsub!(/<sub>(.*?)<\/sub>/, '_{\1}') txt.gsub!(/(^|_)\{(.)\}/, '\1\2') # List items are converted to *). We don't have special support for # nested or ordered lists. txt.gsub!(/<li>/, ' *) ') # All other tags are just removed txt.gsub!(/<[^>]+>/, '') # Convert HTML entities. We do it now to be able to handle stuff # such as txt = Utils.decode_html_entities(txt) # Keep unbreakable spaces or convert them to plain spaces? case val = opts[:nbsp] when :space, ' ' txt.gsub!([160].pack('U'), ' ') else warning "unknown :nbsp option #{val} passed to ircify_html" if val end # Remove double formatting options, since they only waste bytes txt.gsub!(/#{Bold}(\s*)#{Bold}/, '\1') txt.gsub!(/#{Underline}(\s*)#{Underline}/, '\1') # Simplify whitespace that appears on both sides of a formatting option txt.gsub!(/\s+(#{Bold}|#{Underline})\s+/, ' \1') txt.sub!(/\s+(#{Bold}|#{Underline})\z/, '\1') txt.sub!(/\A(#{Bold}|#{Underline})\s+/, '\1') # And finally whitespace is squeezed txt.gsub!(/\s+/, ' ') txt.strip! if opts[:limit] && txt.size > opts[:limit] txt = txt.slice(0, opts[:limit]) + "#{Reverse}...#{Reverse}" end # Decode entities and strip whitespace return txt end
As above, but modify the receiver
# File lib/rbot/core/utils/extends.rb, line 335 def ircify_html!(opts={}) old_hash = self.hash replace self.ircify_html(opts) return self unless self.hash == old_hash end
This method returns the IRC-formatted version of an HTML title found in the string
# File lib/rbot/core/utils/extends.rb, line 360 def ircify_html_title self.get_html_title.ircify_html rescue nil end
Removes non-ASCII symbols from string
# File lib/rbot/core/utils/extends.rb, line 385 def remove_nonascii(replace='') encoding_options = { :invalid => :replace, # Replace invalid byte sequences :undef => :replace, # Replace anything not defined in ASCII :replace => replace, :universal_newline => true # Always break lines with \n } self.encode(Encoding.find('ASCII'), encoding_options) end
This method will strip all HTML crud from the receiver
# File lib/rbot/core/utils/extends.rb, line 343 def riphtml Utils.decode_html_entities(self.gsub("\n",' ').gsub(/<\s*br\s*\/?\s*>/, ' ').gsub(/<[^>]+>/, '')).gsub(/\s+/,' ') end
Returns an Irc::Bot::Auth::Command
from the receiver
# File lib/rbot/botuser.rb, line 120 def to_irc_auth_command Irc::Bot::Auth::Command.new(self) end
This method returns the Irc::Casemap
whose name is the receiver
# File lib/rbot/irc.rb, line 276 def to_irc_casemap begin Irc::Casemap.get(self) rescue # raise TypeError, "Unknown Irc::Casemap #{self.inspect}" error "Unknown Irc::Casemap #{self.inspect} requested, defaulting to rfc1459" Irc::Casemap.get('rfc1459') end end
We keep extending String
, this time adding a method that converts a String
into an Irc::Channel
object
# File lib/rbot/irc.rb, line 1520 def to_irc_channel(opts={}) Irc::Channel.new(self, opts) end
Returns an Irc::Channel::Topic
with self as text
# File lib/rbot/irc.rb, line 1325 def to_irc_channel_topic Irc::Channel::Topic.new(self) end
We keep extending String
, this time adding a method that converts a String
into an Irc::Netmask
object
# File lib/rbot/irc.rb, line 922 def to_irc_netmask(opts={}) Irc::Netmask.new(self, opts) end
This method is used to convert the receiver into a Regular Expression that matches according to the IRC glob syntax
# File lib/rbot/irc.rb, line 340 def to_irc_regexp regmask = Regexp.escape(self) regmask.gsub!(/(\\\\)?\\[*?]/) { |m| case m when /\\(\\[*?])/ $1 when /\\\*/ '.*' when /\\\?/ '.' else raise "Unexpected match #{m} when converting #{self}" end } Regexp.new("^#{regmask}$") end
This method is used to wrap a nonempty String
by adding the prefix and postfix
# File lib/rbot/core/utils/extends.rb, line 366 def wrap_nonempty(pre, post, opts={}) if self.empty? String.new else "#{pre}#{self}#{post}" end end