This project is archived and is in readonly mode.
should be able to truncate a multibyte string to a max # of bytes
Reported by sarah (at ultrasaurus) | September 12th, 2009 @ 12:06 AM
Common issue when you have a database column limited to a specific number of bytes, but you might have a multi-byte string. We think there should be a method to help with this. Here's our proposal (spec followed by implementation):
require File.dirname(FILE) + '/../spec_helper'
describe "Chars#limit_bytes" do
it 'should return "" on ""' do
"".mb_chars.limit_bytes(0).should == ""
"".mb_chars.limit_bytes(1).should == ""
end
it 'should truncate single byte character strings as expected' do
a = "abcd"
a.mb_chars.limit_bytes(0).should == ''
a.mb_chars.limit_bytes(1).should == 'a'
a.mb_chars.limit_bytes(50).should == 'abcd'
end
it 'should truncate multi-byte character strings at character boundaries' do
k = "こんいちわ"
k.mb_chars.limit_bytes(0).should == ''
k.mb_chars.limit_bytes(1).should == ''
k.mb_chars.limit_bytes(3).should == 'こ'
k.mb_chars.limit_bytes(4).should == 'こ'
k.mb_chars.limit_bytes(5).should == 'こ'
k.mb_chars.limit_bytes(6).should == 'こん'
k.mb_chars.limit_bytes(7).should == 'こん'
k.mb_chars.limit_bytes(50).should == 'こんいちわ'
end end
module ActiveSupport #:nodoc:
module Multibyte #:nodoc:
class Chars
def limit_bytes(limit)
limit -= 1 while !valid_boundary?(limit)
s = @wrapped_string.slice(0,limit)
s.mb_chars
end
def valid_boundary?(length)
chunk = @wrapped_string.slice(0,length)
begin
chunk.unpack('U*')
true
rescue
false
end
end
end
end end
Comments and changes to this ticket
-
sarah (at ultrasaurus) September 12th, 2009 @ 12:07 AM
FYI: happy to create a patch if y'all want this change (paired with Wolfram Arnold on this)
-
Alovak September 15th, 2009 @ 05:39 AM
Yesterday I saw described truncation bug with multi-byte string on Basecamp.
-
Wolfram Arnold September 15th, 2009 @ 05:55 AM
Oh, that's a good corroboration. We'll do a patch then.
-
CancelProfileIsBroken September 25th, 2009 @ 12:20 PM
- Tag set to bugmash
-
Elad Meidar September 27th, 2009 @ 07:52 PM
- Tag changed from bugmash to bugmash, multibyte
since it's a bit staling i've attached a patch that applies on both master and 2-3-stable
-
Mike Enriquez September 27th, 2009 @ 10:02 PM
-1 I don't think this fits into the responsibility for Chars. Since we don't have a "String".limit_characters we don't need a "String".mb_chars.limit_bytes.
I haven't had the need to limit_bytes (or characters) based on a database restriction because a validation error is typically used to restrict the size. Truncating the data and saving seems like an application specific thing and can be done by slicing and removing the left over bytes. Note that "こんいちわ".length # => 15, which is the number of bytes.
-
Matías Flores September 28th, 2009 @ 04:50 AM
+1 verified Elad's patch applies cleanly on both master and 2-3-stable and all tests pass.
-
sarah (at ultrasaurus) September 28th, 2009 @ 12:17 PM
Mike,
Sure you can find out the length in bytes for a multi-byte string, but to actually truncate it at a character boundary requires some effort (as you can see in the implementation above).
Validation errors are appropriate when there is user input and as a fail-safe. In the case where we ran into this, we had a field in the database which is filled in as an annotation (for administrators and for debugging) based on a different longer field. For auto-generated strings that need to be truncated, a validation isn't helpful.
Sarah
-
Manfred Stienstra November 1st, 2009 @ 04:25 PM
I don't really feel the use-case, but I guess it's a valid reason. Not sure if this is something to include in core. Attached is a cleaned up version of the proposed patch.
-
Manfred Stienstra November 1st, 2009 @ 05:12 PM
- no changes were found...
-
Repository November 4th, 2009 @ 03:19 PM
- State changed from new to committed
(from [935bd0fef8e26f4ec65fe411a1d29942493f8d46]) Add ActiveSupport::Multibyte::Chars#limit.
The limit method limits the number of bytes in a string. Useful when the
storage space of the string is limited, for instance in a database column
definition.Sharpen up the implementation of translate offset.
[#3192 state:committed] http://github.com/rails/rails/commit/935bd0fef8e26f4ec65fe411a1d299...
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
<h2 style="font-size: 14px">Tickets have moved to Github</h2>
The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>
People watching this ticket
Attachments
Referenced by
- 3191 should be able to truncate a string Actually entered as #3192
- 3192 should be able to truncate a multibyte string to a max # of bytes [#3192 state:committed] http://github.com/rails/rails/co...