This project is archived and is in readonly mode.
Marshal serialized attributes
Reported by Stephen Celis | October 8th, 2008 @ 05:00 PM | in 2.x
This is a simple patch to use Marshal instead of YAML for attribute serialization. Marshaling is significantly faster (see below), and fixes some YAML load issues (including outstanding ticket #857).
require 'benchmark'
require 'yaml'
require 'rubygems'
require 'activesupport'
yaml = YAML.dump([])
marshal = Marshal.dump([])
Benchmark.bm(35) do |x|
x.report('YAML: dump/load') {
10_000.times { YAML.load(YAML.dump([])) }
}
x.report('Marshal: dump/load') {
10_000.times { Marshal.load(Marshal.dump([])) }
}
x.report('YAML: load') {
10_000.times { YAML.load(yaml) }
}
x.report('Marshal: load (with YAML detection)') {
10_000.times { marshal.starts_with?('---') ? YAML.load(marshal) : Marshal.load(marshal) }
}
end
# user system total real
# YAML: dump/load 0.560000 0.020000 0.580000 ( 1.836370)
# Marshal: dump/load 0.040000 0.000000 0.040000 ( 0.131774)
# YAML: load 0.120000 0.020000 0.140000 ( 0.447349)
# Marshal: load (with YAML detection) 0.060000 0.000000 0.060000 ( 0.176883)
(Just to note: all tests still pass.)
Comments and changes to this ticket
-
Michael Koziarski October 8th, 2008 @ 05:23 PM
This will break every application which uses serialized data.
A patch which makes this an option could be applied, but not this.
-
Stephen Celis October 8th, 2008 @ 05:28 PM
No, it doesn't break applications because the attribute string is detected and unserialized as YAML or Marshal accordingly.
-
Michael Koziarski October 8th, 2008 @ 05:36 PM
Well, it kinda depends on your definition of breaks, but silently changing what gets written to the database is a little surprising.
serialize :foo, :with=>:marshall
would achieve the same thing without the risk, so that seems to be the best of both worlds?
-
Stephen Celis October 8th, 2008 @ 05:37 PM
Actually, let me double-check the patch. (Nursing a head cold.) I'll update the ticket accordingly.
-
Stephen Celis October 8th, 2008 @ 05:39 PM
Well, it kinda depends on your definition of breaks, but silently changing what gets written to the database is a little surprising.
I guess so, though I would favor Marshal as the default with perhaps a deprecation notice, given the performance improvement.
-
Stephen Celis October 8th, 2008 @ 05:52 PM
I've double-checked, and YAML-serialized attributes will still correctly deserialize.
I'd be happy to make :marshal an option, but still think that it would be a preferable default. Does anyone else want to weigh in on this?
-
Michael Koziarski October 8th, 2008 @ 05:55 PM
We could do it with an option. Defaulting to yaml, with a deprecation warning that in 2.3 it will default to :marshal.
Then make the switch after we branch, that'll give people plenty of warning, and let you avoid the .starts_with?("---" stuff
-
Stephen Celis October 8th, 2008 @ 06:02 PM
Frederick Cheung makes some valid points on the list:
While the option is fair enough, I don't thing all existing apps wouldn't want this turned on "silently": 1 if other people use your database yaml is ok as there are parsers for it in many languages whereas Marshal would be a PITA 2 if your existing column is not a blob column (which it wouldn't have to be previously since yaml generates plain text), the database will throw a hissy fit (or just truncate the data) when you try to insert a character that is not legal in the charset used. 3 should you be calling string_to_binary if the column supports it?
I'll work on making it an option and attach an updated patch. I'm less sure it should be a future default, though, because of the first 2 points.
-
josh October 11th, 2008 @ 07:07 PM
- Assigned user changed from josh to Michael Koziarski
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
<h2 style="font-size: 14px">Tickets have moved to Github</h2>
The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>