Files
poky/meta/recipes-devtools/ruby/ruby/CVE-2024-49761-0007.patch
Divya Chellam 61c55b9e30 ruby: fix CVE-2024-49761
REXML is an XML toolkit for Ruby. The REXML gem before 3.3.9 has a ReDoS
vulnerability when it parses an XML that has many digits between &# and x...;
in a hex numeric character reference (&#x.... This does not happen with
Ruby 3.2 or later. Ruby 3.1 is the only affected maintained Ruby.
The REXML gem 3.3.9 or later include the patch to fix the vulnerability.

CVE-2024-49761-0009.patch is the CVE fix and rest are dependent commits.

Reference:
https://nvd.nist.gov/vuln/detail/CVE-2024-49761

Upstream-patch:
810d228523
83ca5c4b0f
51217dbcc6
7e4049f6a6
fc6cad570b
7712855547
370666e314
a579730f25
ce59f2eb1a

(From OE-Core rev: 5b453400e9dd878b81b1447d14b3f518809de17e)

Signed-off-by: Divya Chellam <divya.chellam@windriver.com>
Signed-off-by: Steve Sakoman <steve@sakoman.com>
2025-01-18 06:21:02 -08:00

562 lines
22 KiB
Diff

From 370666e314816b57ecd5878e757224c3b6bc93f5 Mon Sep 17 00:00:00 2001
From: NAITOH Jun <naitoh@gmail.com>
Date: Tue, 27 Feb 2024 09:48:35 +0900
Subject: [PATCH] Use more StringScanner based API to parse XML (#114)
## Why?
Improve maintainability by optimizing the process so that the parsing
process proceeds using StringScanner#scan.
## Changed
- Change `REXML::Parsers::BaseParser` from `frozen_string_literal:
false` to `frozen_string_literal: true`.
- Added `Source#string=` method for error message output.
- Added TestParseDocumentTypeDeclaration#test_no_name test case.
- Of the `intSubset` of DOCTYPE, "<!" added consideration for processing
`Comments` that begin with "<!".
## [Benchmark]
```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Calculating -------------------------------------
before after before(YJIT) after(YJIT)
dom 11.240 10.569 17.173 18.219 i/s - 100.000 times in 8.896882s 9.461267s 5.823007s 5.488884s
sax 31.812 30.716 48.383 52.532 i/s - 100.000 times in 3.143500s 3.255655s 2.066861s 1.903600s
pull 36.855 36.354 56.718 61.443 i/s - 100.000 times in 2.713300s 2.750693s 1.763099s 1.627523s
stream 34.176 34.758 49.801 54.622 i/s - 100.000 times in 2.925991s 2.877065s 2.008003s 1.830779s
Comparison:
dom
after(YJIT): 18.2 i/s
before(YJIT): 17.2 i/s - 1.06x slower
before: 11.2 i/s - 1.62x slower
after: 10.6 i/s - 1.72x slower
sax
after(YJIT): 52.5 i/s
before(YJIT): 48.4 i/s - 1.09x slower
before: 31.8 i/s - 1.65x slower
after: 30.7 i/s - 1.71x slower
pull
after(YJIT): 61.4 i/s
before(YJIT): 56.7 i/s - 1.08x slower
before: 36.9 i/s - 1.67x slower
after: 36.4 i/s - 1.69x slower
stream
after(YJIT): 54.6 i/s
before(YJIT): 49.8 i/s - 1.10x slower
after: 34.8 i/s - 1.57x slower
before: 34.2 i/s - 1.60x slower
```
- YJIT=ON : 1.06x - 1.10x faster
- YJIT=OFF : 0.94x - 1.01x faster
---------
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
CVE: CVE-2024-49761
Upstream-Status: Backport [https://github.com/ruby/rexml/commit/370666e314816b57ecd5878e757224c3b6bc93f5]
Signed-off-by: Divya Chellam <divya.chellam@windriver.com>
---
.../lib/rexml/parsers/baseparser.rb | 325 +++++++++---------
.bundle/gems/rexml-3.2.5/lib/rexml/source.rb | 31 +-
2 files changed, 188 insertions(+), 168 deletions(-)
diff --git a/.bundle/gems/rexml-3.2.5/lib/rexml/parsers/baseparser.rb b/.bundle/gems/rexml-3.2.5/lib/rexml/parsers/baseparser.rb
index 595669c..bc59bcd 100644
--- a/.bundle/gems/rexml-3.2.5/lib/rexml/parsers/baseparser.rb
+++ b/.bundle/gems/rexml-3.2.5/lib/rexml/parsers/baseparser.rb
@@ -1,4 +1,4 @@
-# frozen_string_literal: false
+# frozen_string_literal: true
require_relative '../parseexception'
require_relative '../undefinednamespaceexception'
require_relative '../source'
@@ -112,6 +112,19 @@ module REXML
"apos" => [/&apos;/, "&apos;", "'", /'/]
}
+ module Private
+ INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
+ TAG_PATTERN = /((?>#{QNAME_STR}))/um
+ CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
+ ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
+ NAME_PATTERN = /\s*#{NAME}/um
+ GEDECL_PATTERN = "\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
+ PEDECL_PATTERN = "\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"
+ ENTITYDECL_PATTERN = /(?:#{GEDECL_PATTERN})|(?:#{PEDECL_PATTERN})/um
+ end
+ private_constant :Private
+ include Private
+
def initialize( source )
self.stream = source
@listeners = []
@@ -198,183 +211,172 @@ module REXML
#STDERR.puts @source.encoding
#STDERR.puts "BUFFER = #{@source.buffer.inspect}"
if @document_status == nil
- word = @source.match( /\A((?:\s+)|(?:<[^>]*>))/um )
- word = word[1] unless word.nil?
- #STDERR.puts "WORD = #{word.inspect}"
- case word
- when COMMENT_START
- return [ :comment, @source.match( COMMENT_PATTERN, true )[1] ]
- when XMLDECL_START
- #STDERR.puts "XMLDECL"
- results = @source.match( XMLDECL_PATTERN, true )[1]
- version = VERSION.match( results )
- version = version[1] unless version.nil?
- encoding = ENCODING.match(results)
- encoding = encoding[1] unless encoding.nil?
- if need_source_encoding_update?(encoding)
- @source.encoding = encoding
- end
- if encoding.nil? and /\AUTF-16(?:BE|LE)\z/i =~ @source.encoding
- encoding = "UTF-16"
- end
- standalone = STANDALONE.match(results)
- standalone = standalone[1] unless standalone.nil?
- return [ :xmldecl, version, encoding, standalone ]
- when INSTRUCTION_START
+ if @source.match("<?", true)
return process_instruction
- when DOCTYPE_START
- base_error_message = "Malformed DOCTYPE"
- @source.match(DOCTYPE_START, true)
- @nsstack.unshift(curr_ns=Set.new)
- name = parse_name(base_error_message)
- if @source.match(/\A\s*\[/um, true)
- id = [nil, nil, nil]
- @document_status = :in_doctype
- elsif @source.match(/\A\s*>/um, true)
- id = [nil, nil, nil]
- @document_status = :after_doctype
- else
- id = parse_id(base_error_message,
- accept_external_id: true,
- accept_public_id: false)
- if id[0] == "SYSTEM"
- # For backward compatibility
- id[1], id[2] = id[2], nil
+ elsif @source.match("<!", true)
+ if @source.match("--", true)
+ return [ :comment, @source.match(/(.*?)-->/um, true)[1] ]
+ elsif @source.match("DOCTYPE", true)
+ base_error_message = "Malformed DOCTYPE"
+ unless @source.match(/\s+/um, true)
+ if @source.match(">")
+ message = "#{base_error_message}: name is missing"
+ else
+ message = "#{base_error_message}: invalid name"
+ end
+ @source.string = "<!DOCTYPE" + @source.buffer
+ raise REXML::ParseException.new(message, @source)
end
- if @source.match(/\A\s*\[/um, true)
+ @nsstack.unshift(curr_ns=Set.new)
+ name = parse_name(base_error_message)
+ if @source.match(/\s*\[/um, true)
+ id = [nil, nil, nil]
@document_status = :in_doctype
- elsif @source.match(/\A\s*>/um, true)
+ elsif @source.match(/\s*>/um, true)
+ id = [nil, nil, nil]
@document_status = :after_doctype
else
- message = "#{base_error_message}: garbage after external ID"
- raise REXML::ParseException.new(message, @source)
+ id = parse_id(base_error_message,
+ accept_external_id: true,
+ accept_public_id: false)
+ if id[0] == "SYSTEM"
+ # For backward compatibility
+ id[1], id[2] = id[2], nil
+ end
+ if @source.match(/\s*\[/um, true)
+ @document_status = :in_doctype
+ elsif @source.match(/\s*>/um, true)
+ @document_status = :after_doctype
+ else
+ message = "#{base_error_message}: garbage after external ID"
+ raise REXML::ParseException.new(message, @source)
+ end
end
- end
- args = [:start_doctype, name, *id]
- if @document_status == :after_doctype
- @source.match(/\A\s*/um, true)
- @stack << [ :end_doctype ]
- end
- return args
- when /\A\s+/
- else
- @document_status = :after_doctype
- if @source.encoding == "UTF-8"
- @source.buffer_encoding = ::Encoding::UTF_8
+ args = [:start_doctype, name, *id]
+ if @document_status == :after_doctype
+ @source.match(/\s*/um, true)
+ @stack << [ :end_doctype ]
+ end
+ return args
+ else
+ message = "Invalid XML"
+ raise REXML::ParseException.new(message, @source)
end
end
end
if @document_status == :in_doctype
- md = @source.match(/\A\s*(.*?>)/um)
- case md[1]
- when SYSTEMENTITY
- match = @source.match( SYSTEMENTITY, true )[1]
- return [ :externalentity, match ]
-
- when ELEMENTDECL_START
- return [ :elementdecl, @source.match( ELEMENTDECL_PATTERN, true )[1] ]
-
- when ENTITY_START
- match = [:entitydecl, *@source.match( ENTITYDECL, true ).captures.compact]
- ref = false
- if match[1] == '%'
- ref = true
- match.delete_at 1
- end
- # Now we have to sort out what kind of entity reference this is
- if match[2] == 'SYSTEM'
- # External reference
- match[3] = match[3][1..-2] # PUBID
- match.delete_at(4) if match.size > 4 # Chop out NDATA decl
- # match is [ :entity, name, SYSTEM, pubid(, ndata)? ]
- elsif match[2] == 'PUBLIC'
- # External reference
- match[3] = match[3][1..-2] # PUBID
- match[4] = match[4][1..-2] # HREF
- match.delete_at(5) if match.size > 5 # Chop out NDATA decl
- # match is [ :entity, name, PUBLIC, pubid, href(, ndata)? ]
- else
- match[2] = match[2][1..-2]
- match.pop if match.size == 4
- # match is [ :entity, name, value ]
- end
- match << '%' if ref
- return match
- when ATTLISTDECL_START
- md = @source.match( ATTLISTDECL_PATTERN, true )
- raise REXML::ParseException.new( "Bad ATTLIST declaration!", @source ) if md.nil?
- element = md[1]
- contents = md[0]
-
- pairs = {}
- values = md[0].scan( ATTDEF_RE )
- values.each do |attdef|
- unless attdef[3] == "#IMPLIED"
- attdef.compact!
- val = attdef[3]
- val = attdef[4] if val == "#FIXED "
- pairs[attdef[0]] = val
- if attdef[0] =~ /^xmlns:(.*)/
- @nsstack[0] << $1
- end
+ @source.match(/\s*/um, true) # skip spaces
+ if @source.match("<!", true)
+ if @source.match("ELEMENT", true)
+ md = @source.match(/(.*?)>/um, true)
+ raise REXML::ParseException.new( "Bad ELEMENT declaration!", @source ) if md.nil?
+ return [ :elementdecl, "<!ELEMENT" + md[1] ]
+ elsif @source.match("ENTITY", true)
+ match = [:entitydecl, *@source.match(ENTITYDECL_PATTERN, true).captures.compact]
+ ref = false
+ if match[1] == '%'
+ ref = true
+ match.delete_at 1
end
- end
- return [ :attlistdecl, element, pairs, contents ]
- when NOTATIONDECL_START
- base_error_message = "Malformed notation declaration"
- unless @source.match(/\A\s*<!NOTATION\s+/um, true)
- if @source.match(/\A\s*<!NOTATION\s*>/um)
- message = "#{base_error_message}: name is missing"
+ # Now we have to sort out what kind of entity reference this is
+ if match[2] == 'SYSTEM'
+ # External reference
+ match[3] = match[3][1..-2] # PUBID
+ match.delete_at(4) if match.size > 4 # Chop out NDATA decl
+ # match is [ :entity, name, SYSTEM, pubid(, ndata)? ]
+ elsif match[2] == 'PUBLIC'
+ # External reference
+ match[3] = match[3][1..-2] # PUBID
+ match[4] = match[4][1..-2] # HREF
+ match.delete_at(5) if match.size > 5 # Chop out NDATA decl
+ # match is [ :entity, name, PUBLIC, pubid, href(, ndata)? ]
else
- message = "#{base_error_message}: invalid declaration name"
+ match[2] = match[2][1..-2]
+ match.pop if match.size == 4
+ # match is [ :entity, name, value ]
end
- raise REXML::ParseException.new(message, @source)
- end
- name = parse_name(base_error_message)
- id = parse_id(base_error_message,
- accept_external_id: true,
- accept_public_id: true)
- unless @source.match(/\A\s*>/um, true)
- message = "#{base_error_message}: garbage before end >"
- raise REXML::ParseException.new(message, @source)
+ match << '%' if ref
+ return match
+ elsif @source.match("ATTLIST", true)
+ md = @source.match(ATTLISTDECL_END, true)
+ raise REXML::ParseException.new( "Bad ATTLIST declaration!", @source ) if md.nil?
+ element = md[1]
+ contents = md[0]
+
+ pairs = {}
+ values = md[0].scan( ATTDEF_RE )
+ values.each do |attdef|
+ unless attdef[3] == "#IMPLIED"
+ attdef.compact!
+ val = attdef[3]
+ val = attdef[4] if val == "#FIXED "
+ pairs[attdef[0]] = val
+ if attdef[0] =~ /^xmlns:(.*)/
+ @nsstack[0] << $1
+ end
+ end
+ end
+ return [ :attlistdecl, element, pairs, contents ]
+ elsif @source.match("NOTATION", true)
+ base_error_message = "Malformed notation declaration"
+ unless @source.match(/\s+/um, true)
+ if @source.match(">")
+ message = "#{base_error_message}: name is missing"
+ else
+ message = "#{base_error_message}: invalid name"
+ end
+ @source.string = " <!NOTATION" + @source.buffer
+ raise REXML::ParseException.new(message, @source)
+ end
+ name = parse_name(base_error_message)
+ id = parse_id(base_error_message,
+ accept_external_id: true,
+ accept_public_id: true)
+ unless @source.match(/\s*>/um, true)
+ message = "#{base_error_message}: garbage before end >"
+ raise REXML::ParseException.new(message, @source)
+ end
+ return [:notationdecl, name, *id]
+ elsif md = @source.match(/--(.*?)-->/um, true)
+ case md[1]
+ when /--/, /-\z/
+ raise REXML::ParseException.new("Malformed comment", @source)
+ end
+ return [ :comment, md[1] ] if md
end
- return [:notationdecl, name, *id]
- when DOCTYPE_END
+ elsif match = @source.match(/(%.*?;)\s*/um, true)
+ return [ :externalentity, match[1] ]
+ elsif @source.match(/\]\s*>/um, true)
@document_status = :after_doctype
- @source.match( DOCTYPE_END, true )
return [ :end_doctype ]
end
end
if @document_status == :after_doctype
- @source.match(/\A\s*/um, true)
+ @source.match(/\s*/um, true)
end
begin
- next_data = @source.buffer
- if next_data.size < 2
- @source.read
- next_data = @source.buffer
- end
- if next_data[0] == ?<
- if next_data[1] == ?/
+ if @source.match("<", true)
+ if @source.match("/", true)
@nsstack.shift
last_tag = @tags.pop
- md = @source.match( CLOSE_MATCH, true )
+ md = @source.match(CLOSE_PATTERN, true)
if md and !last_tag
message = "Unexpected top-level end tag (got '#{md[1]}')"
raise REXML::ParseException.new(message, @source)
end
if md.nil? or last_tag != md[1]
message = "Missing end tag for '#{last_tag}'"
- message << " (got '#{md[1]}')" if md
+ message += " (got '#{md[1]}')" if md
+ @source.string = "</" + @source.buffer if md.nil?
raise REXML::ParseException.new(message, @source)
end
return [ :end_element, last_tag ]
- elsif next_data[1] == ?!
- md = @source.match(/\A(\s*[^>]*>)/um)
+ elsif @source.match("!", true)
+ md = @source.match(/([^>]*>)/um)
#STDERR.puts "SOURCE BUFFER = #{source.buffer}, #{source.buffer.size}"
raise REXML::ParseException.new("Malformed node", @source) unless md
- if md[0][2] == ?-
- md = @source.match( COMMENT_PATTERN, true )
+ if md[0][0] == ?-
+ md = @source.match(/--(.*?)-->/um, true)
case md[1]
when /--/, /-\z/
@@ -383,17 +385,18 @@ module REXML
return [ :comment, md[1] ] if md
else
- md = @source.match( CDATA_PATTERN, true )
+ md = @source.match(/\[CDATA\[(.*?)\]\]>/um, true)
return [ :cdata, md[1] ] if md
end
raise REXML::ParseException.new( "Declarations can only occur "+
"in the doctype declaration.", @source)
- elsif next_data[1] == ??
+ elsif @source.match("?", true)
return process_instruction
else
# Get the next tag
- md = @source.match(TAG_MATCH, true)
+ md = @source.match(TAG_PATTERN, true)
unless md
+ @source.string = "<" + @source.buffer
raise REXML::ParseException.new("malformed XML: missing tag start", @source)
end
tag = md[1]
@@ -418,7 +421,7 @@ module REXML
return [ :start_element, tag, attributes ]
end
else
- md = @source.match( TEXT_PATTERN, true )
+ md = @source.match(/([^<]*)/um, true)
text = md[1]
return [ :text, text ]
end
@@ -462,8 +465,7 @@ module REXML
# Unescapes all possible entities
def unnormalize( string, entities=nil, filter=nil )
- rv = string.clone
- rv.gsub!( /\r\n?/, "\n" )
+ rv = string.gsub( /\r\n?/, "\n" )
matches = rv.scan( REFERENCE_RE )
return rv if matches.size == 0
rv.gsub!( /&#0*((?:\d+)|(?:x[a-fA-F0-9]+));/ ) {
@@ -498,9 +500,9 @@ module REXML
end
def parse_name(base_error_message)
- md = @source.match(/\A\s*#{NAME}/um, true)
+ md = @source.match(NAME_PATTERN, true)
unless md
- if @source.match(/\A\s*\S/um)
+ if @source.match(/\s*\S/um)
message = "#{base_error_message}: invalid name"
else
message = "#{base_error_message}: name is missing"
@@ -577,11 +579,28 @@ module REXML
end
def process_instruction
- match_data = @source.match(INSTRUCTION_PATTERN, true)
+ match_data = @source.match(INSTRUCTION_END, true)
unless match_data
message = "Invalid processing instruction node"
+ @source.string = "<?" + @source.buffer
raise REXML::ParseException.new(message, @source)
end
+ if @document_status.nil? and match_data[1] == "xml"
+ content = match_data[2]
+ version = VERSION.match(content)
+ version = version[1] unless version.nil?
+ encoding = ENCODING.match(content)
+ encoding = encoding[1] unless encoding.nil?
+ if need_source_encoding_update?(encoding)
+ @source.encoding = encoding
+ end
+ if encoding.nil? and /\AUTF-16(?:BE|LE)\z/i =~ @source.encoding
+ encoding = "UTF-16"
+ end
+ standalone = STANDALONE.match(content)
+ standalone = standalone[1] unless standalone.nil?
+ return [ :xmldecl, version, encoding, standalone ]
+ end
[:processing_instruction, match_data[1], match_data[2]]
end
diff --git a/.bundle/gems/rexml-3.2.5/lib/rexml/source.rb b/.bundle/gems/rexml-3.2.5/lib/rexml/source.rb
index db78a12..4111d1d 100644
--- a/.bundle/gems/rexml-3.2.5/lib/rexml/source.rb
+++ b/.bundle/gems/rexml-3.2.5/lib/rexml/source.rb
@@ -76,6 +76,10 @@ module REXML
end
end
+ def string=(string)
+ @scanner.string = string
+ end
+
# @return true if the Source is exhausted
def empty?
@scanner.eos?
@@ -150,28 +154,25 @@ module REXML
def read
begin
@scanner << readline
+ true
rescue Exception, NameError
@source = nil
+ false
end
end
def match( pattern, cons=false )
- if cons
- md = @scanner.scan(pattern)
- else
- md = @scanner.check(pattern)
- end
- while md.nil? and @source
- begin
- @scanner << readline
- if cons
- md = @scanner.scan(pattern)
- else
- md = @scanner.check(pattern)
- end
- rescue
- @source = nil
+ read if @scanner.eos? && @source
+ while true
+ if cons
+ md = @scanner.scan(pattern)
+ else
+ md = @scanner.check(pattern)
end
+ break if md
+ return nil if pattern.is_a?(String) && pattern.bytesize <= @scanner.rest_size
+ return nil if @source.nil?
+ return nil unless read
end
md.nil? ? nil : @scanner
--
2.40.0