Feature #1592

Upgraded iSCSI drivers

Added by SZTAKI LPDS over 7 years ago. Updated over 7 years ago.

Status:ClosedStart date:10/25/2012
Priority:NormalDue date:
Assignee:Jaime Melis% Done:

0%

Category:Drivers - Auth
Target version:Release 4.0
Resolution:fixed Pull request:

Description

Dear Developers,

we upgraded the iscsi drivers to achive higher reliability and stability while working with iscsi datastores with TGT server. We use these scripts in production and we were able to start hunderds of virtual machines at once on the same datastore without a single error. The attached patch contains these modifications:
- The scripts are now uses the tgt-setup-lun script provided by the tgt package instead of the `get TID, create new target, bind new target, bind logical unit` function, so the scripts are much more readable and immune to a lot of race conditions (we attached a fine tuned tgt-setup-lun script which is fine tuned for the opennebula use case, we are working to get the modifications into the mainstream tgt software)
- Block size of 2M is much faster in most of the cases than the 64k while copying or cloning a logical volume
- tgtadm have a bug if you have a lot of iSCSI targets and want to show the configration, it basically sends an EOF character in the middle of the output stream so it is hard to process the stream with awk or grep, to work around the problem, we pipe the output through the strings command.
- Sometime it takes more than 2 seconds for the luns to appear after you log in to an iscsi target, therefore we implemented a polling algorithm to check whether the luns appeared or not before symlinking the lun to a datastore disk.
- We partitioned the Discovery - Login - Wait - Symlinking function into individual command groups while cloning or symlinking a disk for better error handling and readability.

Please check the attached patch.

prod_iscsi.patch Magnifier (7.55 KB) SZTAKI LPDS, 10/25/2012 03:23 PM

tgt-setup-lun (6.26 KB) SZTAKI LPDS, 10/25/2012 03:23 PM

merge_ssh.patch Magnifier (1.21 KB) Jaime Melis, 11/16/2012 04:57 PM

ln (2.81 KB) Jaime Melis, 11/16/2012 04:58 PM


Related issues

Related to Bug #1682: Race condition when getting TID for iSCSI volumes Closed 12/06/2012

Associated revisions

Revision 84120fc3
Added by Jaime Melis over 7 years ago

Feature #1592: Apply SZTAKI LPDS patch as is.

Revision 78e300e5
Added by Jaime Melis over 7 years ago

Feature #1592: Merge SSH connections into a single one

Revision 7890918d
Added by Jaime Melis over 7 years ago

Feature #1592: Upload a patched version of the 'tgt-setup-lun' script provided by SZTAKI LPDS.

Revision 11d0f29a
Added by Jaime Melis over 7 years ago

Feature #1592: Add TGT_SHARE_FILES.

Revision 62def952
Added by Jaime Melis over 7 years ago

Feature #1592: Check if TGTSETUPLUN is installed

Revision d28dd039
Added by Jaime Melis over 7 years ago

Feature #1592: Apply SZTAKI LPDS patch as is.

Revision 6ffadd9d
Added by Jaime Melis over 7 years ago

Feature #1592: Merge SSH connections into a single one

Revision ae7f0d72
Added by Jaime Melis over 7 years ago

Feature #1592: Upload a patched version of the 'tgt-setup-lun' script provided by SZTAKI LPDS.

Revision 0304a931
Added by Jaime Melis over 7 years ago

Feature #1592: Add TGT_SHARE_FILES.

Revision 0a5b731f
Added by Jaime Melis over 7 years ago

Feature #1592: Check if TGTSETUPLUN is installed

History

#1 Updated by Ruben S. Montero over 7 years ago

  • Target version set to Release 4.0

THANKS FOR YOUR CONTRIBUTION!

WE'll take a look at it and will integrate it for the next release

#2 Updated by Jaime Melis over 7 years ago

First of all, thanks a lot for the contribution!

I agree with all the changes you have made, I believe it can be applied almost as is. I have a few questions, though, just to clarify things:

1. The DISK_DEVNAME variable in ln and clone isn't used at all, right? Can you confirm that I can safely remove that? I'm just asking in case you left something behind in your patch.

2. One of the things we try to do in the drivers is to keep the number of SSH connections to a bare minimum. In that spirit I would like to propose to merge DISCOVER_CMD, TEST_CMD and LINK_CMD. Do you agree with this? (see the attached patch which applies on top of your patch - I haven't tested it by the way, consider it as pseudocode)

#3 Updated by Jaime Melis over 7 years ago

  • File ln added
  • Status changed from New to Assigned
  • Assignee set to Jaime Melis

I'm also attaching the final (not tested) file after applying both patches, so you don't need to bother.

#4 Updated by SZTAKI LPDS over 7 years ago

Jaime Melis wrote:

First of all, thanks a lot for the contribution!

You are welcome!

I agree with all the changes you have made, I believe it can be applied almost as is. I have a few questions, though, just to clarify things:

Thank you very much for looking at our patch

1. The DISK_DEVNAME variable in ln and clone isn't used at all, right? Can you confirm that I can safely remove that? I'm just asking in case you left something behind in your patch.

Yes we forgot about that, the point of that line was to gather the disk name in order to be able to set the disk io scheduler to "noop", because that performs way better than "cfq" or "deadline" with iscsi disks. However within our infrastructure, the noop is the default scheduler on the VM host nodes, and explicitly set the local disk schedulers to deadline that is why that part of the code is not finished. We think this is an important adjustment for production use, so please consider to add something like this "echo noop > /sys/block/$DISK_DEVNAME/queue/scheduler" to all the transfer manager drivers where iscsi disk initialization happens.

2. One of the things we try to do in the drivers is to keep the number of SSH connections to a bare minimum. In that spirit I would like to propose to merge DISCOVER_CMD, TEST_CMD and LINK_CMD. Do you agree with this? (see the attached patch which applies on top of your patch - I haven't tested it by the way, consider it as pseudocode)

We can agree with that, we detached the commands for better readability, but of course the low number of SSH connections is more important.

#5 Updated by Ruben S. Montero over 7 years ago

  • Status changed from Assigned to Closed
  • Resolution set to fixed

Changes are now in master. THANKS FOR YOUR CONTRIBUTION!

Also available in: Atom PDF